Home
PBS Professional 13.0 Beta User's Guide
Contents
1. Wsuppress_email lt N gt on page 148 of the PBS Professional Reference Guide 8 6 2 Altering a Job Array The qalter command can only be used on a job array object not on subjobs or ranges Job array attributes are the same as for jobs UG 208 PBS Professional 13 0 Beta User s Guide Job Arrays Chapter 8 8 6 3 Moving a Job Array The qmove command can only be used with job array objects not with subjobs or ranges Job arrays can only be moved from one server to another if they are in the Q H or W states and only if there are no running subjobs The state of the job array object is preserved in the move The job array will run to completion on the new server As with jobs a qstat on the server from which the job array was moved does not show the job array A qstat on the job array object is redirected to the new server 8 6 4 Holding a Job Array The qhold command can only be used with job array objects not with subjobs or ranges A hold can be applied to a job array only from the Q B or W states This puts the job array in the H held state If any subjobs are running they will run to completion No queued subjobs are started while in the H state 8 6 5 Releasing a Job Array The qrls command can only be used with job array objects not with subjobs or ranges If the job array was in the Q or B state it is returned to that state If it was in the W state
2. Applies Name Type PP Value To array Boolean Job array True if item is job array array_id String Subjob Subjob s job array identifier array_index String Subjob Subjob s index number array_state_count String Job array Similar to state_count attribute for server and queue objects Lists number of sub jobs in each state array_indices_remaining String Job array List of indices of subjobs still queued Range or list of ranges e g 500 552 596 1000 array_indices_submitted String Job array Complete list of indices of sub jobs given at submission time Given as range e g 1 100 8 3 6 Job Array States The state of subjobs in the same job array can be different See Job Array States on page 424 of the PBS Professional Reference Guide and Subjob States on page 424 of the PBS Professional Reference Guide UG 194 PBS Professional 13 0 Beta User s Guide Job Arrays Chapter 8 8 3 7 PBS Environmental Variables for Job Arrays Table 8 2 PBS Environmental Variables for Job Arrays Environment For Description Variable Name Usad Fo escniptio PBS_ARRAY_INDEX subjobs Subjob index in job array e g 7 PBS_ARRAY_ID subjobs Identifier for a job array Sequence number of job array e g 1234 server PBS_JOBID Jobs subjobs Identifier for a job or a subjob For subjob sequence number and subjob index in brack ets e g 1234 7 server 8 3 8 Accounting Job accoun
3. All three scripts are located in homedir testdir bin sh PBS N ArrayExample PBS J 1 2 echo Main script index PBS ARRAY INDEX homedir testdir scriptlet PBS ARRAY INDEX In our example scriptlet and scriptlet2 simply echo their names We run ArrayScript using the qsub command qsub ArrayScript Example 8 6 In this example we have a script called StageScript It takes two input files dataX and extrax and makes an output file newdataX as well as echoing PBS Professional 13 0 Beta User s Guide UG 199 Chapter 8 Job Arrays which iteration it is on The dataX and extrax files will be staged from inputs to work then newdatax will be staged from work to outputs bin sh PBS N StagingExample PBS J 1 2 PBS W stagein homedir work data array_index host1 homedir inputs data array index homedir work extra array_ index host1 homedir inputs extra array_index PBS W stageout homedir work newdata array index host1 homedir outputs newdata array_index echo Main script index PBS ARRAY INDEX cd homedir work cat data PBS ARRAY INDEX extra PBS ARRAY INDEX gt gt newdata PBS ARRAY INDEX Local path execution directory homedir work Remote host data storage host hostl Remote path for inputs original data files dataX and extrax homedir inputs Remote path for results output of computation newdatax homedir outputs StageScript resides in homedir tes
4. Indicates command button is visible only if xpbs is started with the admin option 14 6 How to Submit a Job Using xpbs To submit a job using xpbs perform the following steps First select a host from the HOSTS listbox in the main xpbs display to which you wish to submit the job PBS Professional 13 0 Beta User s Guide UG 299 Chapter 14 Using the xpbs GUI Next click on the Submit button located next to the HOSTS panel The Submit button brings up the Submit Job Dialog box see below which is composed of four distinct regions The Job Script File region is at the upper left The OPTIONS region containing various widgets for setting job attributes is scattered all over the dialog box The OTHER OPTIONS is located just below the Job Script file region and COMMAND BUTTONS region is at the bottom c Job Name Priorityj0 7 Account Name J Hold Job oe F When to Queue NOW LATER at Notify email addrs when W job aborts i 1 job begins execution _ job terminates Retain _ Stdout in exec_host lt jobname gt o lt seq gt _ Stderr in exec_host lt jobname gt e lt seq gt Don t Merge Stdout File Name sL on hosmamwe i Stderr File Name A on hostname OS Environment Variables to Export Current i delete confirm smse interactive cancel reset options to desana help The job script region is
5. Open MPI with PBS on page 136 Platform MPI 8 0 See section 5 2 15 Platform MPI with PBS on page 136 SGI MPT Any See section 5 2 16 SGI MPT with PBS on page 136 5 2 1 1 Integration Caveats e Under Windows MPIs are not integrated with PBS PBS is limited to tracking resources signaling jobs and performing accounting only for job processes on the primary vnode e Some MPI command lines are slightly different the differences for each are described PBS Professional 13 0 Beta User s Guide UG 101 Chapter 5 Multiprocessor Jobs 5 2 1 2 Integrating an MPI on the Fly using the pbs_tmrsh Command The PBS administrator can perform the steps to integrate the supported MPIs For non inte grated MPIs you can integrate them on the fly using the pbs_tmrsh command You should not use pbs_tmrsh with an integrated MPI This command emulates rsh but uses the PBS TM interface to talk directly to pbs_ mom on sister vnodes The pbs_tmrsh command informs the primary and sister MoMs about job processes on sister vnodes When the job uses pbs_tmrsh PBS can track resource usage for all job processes You use pbs_tmrsh as your rsh or ssh command To use pbs_tmrsh set the appropri ate environment variable to pbs_tmrsh For example to integrate MPICH set the P4_RSHCOMMAND environment variable to pbs_tmrsh and to integrate HP MPI set MPI_REMSH to pbs_tmrsh UG 102 PBS Professional 13 0 Beta User s Guide Multiprocessor
6. To use PBS you create a batch job usually just called a job which you then hand off or sub mit to PBS A batch job is a set of commands and or applications you want to run on one or more execution machines contained in a file or typed at the command line You can include instructions which specify the characteristics such as job name and resource requirements such as memory CPU time etc that your job needs The job file can be a shell script under UNIX a cmd batch file under Windows a Python script a Perl script etc For example here is a simple PBS batch job file which requests one hour of time 400MB of memory 4 CPUs and runs my_application 1 bin sh PBS 1 walltime 1 00 00 PBS 1 mem 400mb ncpus 4 my application To submit the job to PBS you use the qsub command and give the job script as an argument to qsub For example to submit the script named my script qsub my_script We will go into the details of job script creation in section 2 2 The PBS Job Script on page 11 and job submission in section 2 3 Submitting a PBS Job on page 17 PBS Professional 13 0 Beta User s Guide UG 7 Chapter 2 Submitting a PBS Job 2 1 1 Lifecycle of a PBS Job Briefly Your PBS job has the following lifecycle 2 3 13 14 15 16 I7 18 You write a job script You submit the job to PBS PBS accepts the job and returns a job ID to you The PBS scheduler finds the right place and time
7. You can use PBS_NODEFILE in your job script You can modify the node file You can remove entries or sort the entries PBS does not use the contents of the node file 5 1 2 5 Node File Caveats Do not add entries for new hosts PBS may terminate processes on those hosts because PBS does not expect the processes to be running there Adding entries on the same host may cause the job to be terminated because it is using more CPUs than it requested 5 1 2 6 Viewing Execution Hosts You can see which host is the primary execution host the primary execution host is the first host listed in the job s node file 5 1 3 Specifying Number of MPI Processes Per Chunk How you request chunks matters First the number of MPI processes per chunk defaults to 7 for chunks with CPUs and 0 for chunks without CPUs unless you specify this value using the mpiprocs resource Second you can specify whether MPI processes share CPUs For example requesting one chunk with four CPUs and four MPI processes is not the same as requesting four chunks each with one CPU and one MPI process In the first case all four MPI processes are sharing all four CPUs In the second case each process gets its own CPU You request the number of MPI processes you want for each chunk using the mpiprocs resource For example to request two MPI processes for each of four chunks where each chunk has two CPUs lselect 4 ncpus 2 mpiprocs 2 PBS Professional 13 0 Beta User
8. s Guide Allocating Resources amp Placing Jobs Chapter 4 Keep in mind the difference between requesting a vnode level boolean and a job wide bool ean qsub 1 select 1 green True requests a vnode with green set to True However qsub 1 green True requests green set to True on the server and or queue 4 3 5 Requesting Application Licenses Application licenses are managed as resources defined by your PBS administrator PBS doesn t actually check out the licenses the application being run inside the job s session does that 4 3 5 1 Requesting Floating Application Licenses A site wide floating license is typically configured as a server level job wide resource To request a job wide application license called AppF use qsub l AppF lt number of licenses gt lt other qsub arguments gt If only certain hosts can run the application they will typically have a host level Boolean resource set to True The job wide resource AppF is a numerical resource indicating the number of licenses avail able at the site The host level Boolean resource named haveAppF indicates whether a given host can run the application To request the application license and the vnodes on which to run the application qsub l AppF lt number of licenses gt lt other qsub arguments gt 1 select haveAppF True PBS queries the license server to find out how many floating licenses are available at the beginning of each scheduling cycle PBS doesn t actua
9. 4 5 3 Job wide Resource Limits Job wide resource limits set a limit for per job resource usage Job resource limits are derived from job wide resources and from totals of per chunk consumable resources Limits are derived from explicitly requested resources and default resources Job wide resource limits that are derived from from sums of all chunks override those that are derived from job wide resources Example 4 11 Job wide limits are derived from sums of chunks With the following chunk request qsub lselect 2 ncpus 3 mem 4gb arch linux The following job wide limits are derived ncpus 6 mem segb 4 5 4 Per chunk Resource Limits Each chunk s per chunk limits determine how much of any resource can be used at that host PBS sums the chunk limits at each host and uses that sum as the limit at that resource Per chunk resource usage limits are the amount of per chunk resources allocated to the job both from explicit requests and from defaults UG 74 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 4 5 4 1 Effects of Limits If a running job exceeds its limit for walltime the job is terminated If any of the job s processes exceed the limit for pcput pmem or pymem the job is termi nated If any of the host limits for mem ncpus cput or vmem is exceeded the job is terminated These are host level limits so if for example your job has two chunks on one host and the p
10. Example qdel 51 qdel 1234 server Job array identifiers must be enclosed in double quotes 9 3 1 Deleting Jobs with Force You can delete a job whether or not its execution host is reachable and whether or not it is in the process of provisioning qdel W force lt job ID gt 9 3 2 Deleting Finished Jobs By default the qdel command does not affect finished jobs You can use the qdel x option to delete job histories This option also deletes any specified jobs that are queued run ning held suspended finished or moved When you use this you are deleting the job and its history in one step If you use the qdel command without the x option you delete the job but not the job history and you cannot delete a finished job To delete a finished job whether or not it was moved qdel x lt job ID gt If you try to delete a finished job without the x option you will get the following error qdel Job lt jobid gt has finished 9 3 3 Deleting Moved Jobs You can use the qdel x option to delete jobs that are queued running held suspended finished or moved UG 218 PBS Professional 13 0 Beta User s Guide Working with PBS Jobs Chapter 9 To delete a job that was moved qdel lt job ID sequence number gt lt original server gt To delete a job that was moved and then finished qdel x lt job ID gt 9 3 4 Restricting Number of Emails By default mail is sent for each job or subjob you delete Use the fo
11. UG vi UG Contents 14 Using the xpbs GUI 289 14 1 Using the xpbs command 25ss 2ciien gots dciiaeaendadhh ces 289 14 2 Using xpbs Definitions of Terms 0 000 eee eee 290 14 3 Introducing the xpbs Main Display 0 0000 eee eae 291 14 4 Setting xpbs Preferences s 2 0000000ac sea ceneae tena en eeaans 297 14 5 Relationship Between PBS and xpbs 2 200 298 14 6 Howto Submit a Job Using xpbs 0 aaaea aaa 299 TALL JEXIUING EEE ered E E E amp geeek ea x 302 14 8 The xpbs Configuration File c ccs2i0aceresccrvaaduwe de mea ava 303 149 xpbs PreferenGes a5 2 5 2 5 4 4005 669 4544 kiaii 5448 2 98 pare Be dS eS 303 UG Index 307 PBS Professional 13 0 Beta User s Guide 1 New Features 1 1 New Features 1 1 1 New Features in PBS 13 0 1 1 1 1 Limiting Preemption Targets You can specify which jobs can be preempted by a given job See section 4 8 33 3 i How Preemption Targets Work on page 241 1 1 1 2 Running qsub in the Foreground By default the qsub command runs in the background You can run it in the foreground using the f option See qsub on page 219 of the PBS Professional Reference Guide 1 1 1 3 Windows Users can Use UNC Paths Windows users can use UNC paths for job submission and file staging See Set up Paths on page 16 ofthe PBS Professional User s Guide and Using UNC Paths on page 40 of the PBS Professional User s Guide 1 1
12. single signon method or 2 by specifying the password for each job when submitted per job method Check with your system administrator to see which method was configured at your site 2 3 6 1 i Single Signon Password Method To provide PBS with a password to be used for all your PBS jobs use the pbs_password command This command can be used whether or not you have jobs enqueued in PBS The command usage syntax is pbs_password s server r d user When no options are given to pbs_ password the password credential on the default PBS server for the current user i e the user who executes the command is updated to the prompted password Any user jobs previously held due to an invalid password are not released The available options to pbs_ password are PBS Professional 13 0 Beta User s Guide UG 23 Chapter 2 Submitting a PBS Job r Any user jobs previously held due to an invalid password are released S server Allows you to specify server where password will be changed d Deletes the password user The password credential of user user is updated to the prompted password If user is not the current user this action is only allowed if 1 The current user is root or admin 2 User user has given the current user explicit access via the ruserok mech anism a The hostname of the machine from which the current user is logged in appears in the server s hosts equiv file or b The current user has
13. 1 select form qalter l select 1 ncpus 4 mem 512mb 230 No error reported by qalter 9 2 1 Changing the Selection Directive If the selection directive is altered the job limits for any consumable resource in the directive are also modified PBS Professional 13 0 Beta User s Guide UG 215 Chapter 9 Working with PBS Jobs For example if a job is queued with the following resource list select 2 ncpus 1 mem 5gb job limits are set to ncpus 2 mem 10gb If the select statement is altered to request select 3 ncpus 2 mem 6gb then the job limits are reset to ncpus 6 and mem 18gb 9 2 2 Changing the Job wide Limit If the job wide limit is modified the corresponding resources in the selection directive are not modified It would be impossible to determine where to apply the changes in a compound directive Reducing a job wide limit to a new value less than the sum of the resource in the directive is strongly discouraged This may produce a situation where the job is aborted during execution for exceeding its limits The actual effect of such a modification is not specified A job s walltime may be altered at any time except when the job is in the Exiting state regardless of the initial value If a job is queued requested modifications must still fit within the queue s and server s job resource limits If a requested modification to a resource would exceed the queue s or server s job resource limits the resource req
14. 14 5 Relationship Between PBS and xpbs xpbs is built on top of the PBS client commands such that all the features of the command line interface are available through the GUI Each task that you perform using xpbs is con verted into the necessary PBS command and then run Table 14 4 xpbs Buttons and PBS Commands Command Location Button PBS Command Hosts Panel detail qstat B f selected server_host s Hosts Panel submit qsub options selected server s Hosts Panel terminate qterm selected server_host s Queues Panel detail qstat Q f selected queue s UG 298 PBS Professional 13 0 Beta User s Guide Using the xpbs GUI Chapter 14 Table 14 4 xpbs Buttons and PBS Commands Command Location PBS Command Button Queues Panel stop qstop selected queue s Queues Panel start qstart selected queue s Queues Panel enable qenable selected queue s Queues Panel disable qdisable selected queue s Jobs Panel detail qstat f selected job s Jobs Panel modify qalter selected job s Jobs Panel delete qdel selected job s Jobs Panel hold qhold selected job s Jobs Panel release qr1s selected job s Jobs Panel run qrun selected job s Jobs Panel rerun qrerun selected job s Jobs Panel signal qsig selected job s Jobs Panel msg qmsg selected job s Jobs Panel move qmove selected job s Jobs Panel order qorder selected job s
15. A boolean value True or False indicating whether or not to iconize the QUEUES region iconizeJobsView A boolean value True or False indicating whether or not to iconize the JOBS region iconizelnfoView A boolean value True or False indicating whether or not to iconize the INFO region jobResourceList A curly braced list of resource names as according to architecture known to xpbs The format is as follows lt arch typel gt resnamel resname2 resnameN lt arch type2 gt resnamel resname2 resnameN lt arch typeN gt resnamel resname2 resnameN UG 306 PBS Professional 13 0 Beta User s Guide A accelerator UG 252 accelerator memory UG 252 accelerator_model UG 252 accounting UG 288 ACCT_TMPDIR UG 288 Advance reservation creation UG 175 advance reservation UG 173 AIX UG 105 Large Page Mode UG 287 AOE UG 279 using UG 280 application licenses floating UG 63 node locked per CPU UG 64 arrangement UG 78 B blocking jobs UG 163 Cc Changing order of jobs UG 221 chunk UG 59 UG 61 chunk level resource UG 59 commands and provisioning UG 283 comment UG 233 count_spec UG 176 CSA UG 287 cygwin UG 13 PBS Professional 13 0 Beta User s Guide UG Index D Deleting Jobs UG 218 dependencies xpbs UG 148 Deprecations UG 5 devtype UG 107 directive UG 47 UG 301 Display non running jobs UG 233 E euidevice UG 107 euilib UG 107 exclhost UG 78 exclusive UG 7
16. After the first two subjobs finish qstat Jtp Job id Name User done S Queue 1235 1 host ArrayExample user1l 100 X workq 1235 2 host ArrayExample userl 100 X workq 1235 3 host ArrayExample userl R workq 1235 4 host ArrayExample user1 R workq 1235 5 host ArrayExample user1 Q workq qstat pt Job id Name User done S Queue 1235 host ArrayExample user1 40 B workq 1235 1 host ArrayExample user1 100 X workq 1235 2 host ArrayExample userl 100 X workq 1235 3 host ArrayExample userl R workg 1235 4 host ArrayExample user1 R workgq 1235 5 host ArrayExample user1 Q workgq 1236 host JobExample userl Q workgq PBS Professional 13 0 Beta User s Guide Job Arrays Now if we wait until only the last subjob is still running Chapter 8 Req d Req d Elap SessID NDS TSK Memory Time S Time Req d Req d Elap qstat rt Job ID Username Queue Jobname 1235 5 host userl workg ArrayBxamp 3048 E e 1236 host userl workq JobExample 3042 1 qstat Jrt Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time 1235 5 host userl workg ArrayExamp 048 1 8 6 Using PBS Commands with Job Arrays The following table shows how you can or cannot use PBS commands with job arrays sub jobs or ranges Table 8 5 Using PBS Commands with Job Arrays Argument to Command Array Range Array Index Comman Array Array yiRange yI i d Object Specified Range of
17. Chapter 12 Using Provisioning Only one kind of aoe resource can be requested in a job For example an acceptable job could make the following request l select 1 ncpus 1 aoe suset1 ncpus 2 ace suse 12 3 2 1 Vnode Job Restrictions A vnode with any of the following jobs will not be selected for provisioning e One or more running jobs e A suspended job e Ajob being backfilled around 12 3 2 2 Provisioning Job Restrictions A job that requests an AOE will not be backfilled around 12 3 2 3 Vnode Reservation Restrictions A vnode will not be selected for provisioning for job MyJob if the vnode has a confirmed res ervation and the start time of the reservation is before job MyJob will end A vnode will not be selected for provisioning for a job in reservation R1 if the vnode has a confirmed reservation R2 and an occurrence of R1 and an occurrence of R2 overlap in time and share a vnode for which different AOEs are requested by the two occurrences 12 3 3 Requirements for Jobs 12 3 3 1 If AOE is Requested All Chunks Must Request Same AOE If any chunk of a job requests an AOE all chunks must request that AOE If a job requesting an AOE is submitted to a reservation that reservation must also request the same AOE UG 282 PBS Professional 13 0 Beta User s Guide Using Provisioning Chapter 12 12 4 Using Provisioning 12 4 1 Requesting Provisioning You request a reservation with an AOE in order to reserve the res
18. If the resource request specifies an Mppnodes range with the value on the right hand side of the range less than or equal to the value on the left hand side of the range the job or reserva tion is rejected with the following message The following error was encountered Bad range lt range gt the first number lt left_side gt must be less than the second number lt right_side gt A log message is printed to the server log at event class 0x0004 translate mpp ERROR bad range lt range gt the first number lt left_side gt must be less than the second number lt right_side gt 11 8 6 Resource Request Containing Both mpp and select place If a resource request contains both mpp and select place the job or reservation is rejected and the following error is printed The following error was encountered mpp resources cannot be used with select or place UG 278 PBS Professional 13 0 Beta User s Guide 12 Using Provisioning PBS provides automatic provisioning of an OS or application on vnodes that are configured to be provisioned When a job requires an OS that is available but not running or an application that is not installed PBS provisions the vnode with that OS or application 12 1 Definitions AOE The environment on a vnode This may be one that results from provisioning that vnode or one that is already in place Provision To install an OS or application or to run a script which
19. On the Submit Job window in the miscellany options section far left center of window click on the file staging button This will launch the File Staging dialog box shown below in which you will be able to set up the file staging you desire The File Selection Box will be initialized with your current working directory If you wish to select a different directory double click on its name and xpbs will list the contents of the new directory in the File Selection Box When the correct directory is displayed simply click on the name of the file you wish to stage in or out Its name will be written in the File Selected area Next click either of the Add file selected buttons to add the named file to the stagein or sta geout list Doing so will write the file name into the corresponding area on the lower half of the File Staging window Now you need to provide location information For stagein type in the path and filename where you want the named file placed For stageout specify the host name and pathname where you want the named file delivered You may repeat this process for as many files as you need to stage When you are done selecting files click the OK button 3 2 11 Stagein and Stageout Failure 3 2 11 1 File Stagein Failure When stagein fails the job is placed in a 30 minute wait to allow you time to fix the problem Typically this is a missing file or a network outage Email is sent to the job owner when the problem is
20. On UNIX UserA UserB and UserC must each have rhosts files at their servers that list UserS 2 5 4 1 Caveats for Changing Job Username e Wherever your job runs you must have permission to run the job under the specified user name See section 2 4 4 Setting Up Your User Authorization on page 17 e User names are limited to 256 characters 2 5 5 Specifying Job Group ID Your username can belong to more than one group but each PBS job is only associated with one of those groups By default the job runs under the primary group The job s group is specified in the group_list job attribute You can change the group under which your job runs on the execution host either on the command line or by using a PBS directive qsub W group list lt group list gt PBS group list lt group list gt PBS Professional 13 0 Beta User s Guide UG 31 Chapter 2 Submitting a PBS Job For example qsub W group_list grpA grpB jupiter my_job The lt group list gt argument has the following form group host group host You can specify only one group name per host You can specify only one group without a corresponding host that group name is used for execution on any host not named in the argument list The group_list defaults to the primary group of the username under which the job runs 2 5 5 1 Group Names Under Windows Under Windows the primary group is the first group found for the username by PBS when querying the accou
21. Prerequisites for Checkpointing on page 154 The qr1s command can only be used with job array objects not with subjobs or ranges The job array will be returned to its pre hold state which can be either Q B or W The qhold command can only be used with job array objects not with subjobs or ranges A hold can be applied to a job array only from the Q B or W states This will put the job array in the H held state If any subjobs are running they will run to completion No queued subjobs will be started while in the H state PBS limits the number of times it tries to run a job to 21 and tracks this count in the job s run_count attribute If your job is checkpointed and requeued enough times it will be held 6 5 7 Why is Your Job Held Your job may be held for any of the following reasons Provisioning fails due to invalid provisioning request or to internal system error s g p After provisioning the AOE reported by the vnode does not match the AOE requested by the job s The job was held by a PBS Manager or Operator o The job was checkpointed and requeued s Your job depends on a finished job for which PBS is maintaining history s The job s password is invalid p The job s run_count attribute has a value greater than 20 PBS Professional 13 0 Beta User s Guide UG 159 Chapter 6 Controlling How Your Job Runs 6 5 8
22. e You can run xpbs and give it keyboard input You can use an Altair front end product to submit and monitor jobs go to www pbsworks com 2 3 3 Submitting a Job Using a Script You submit a job to PBS using the qsub command For details on qsub see qsub on page 219 of the PBS Professional Reference Guide To submit a PBS job type the following e UNIX Linux shell script qsub lt name of shell script gt e UNIX Linux Python or Perl script qsub lt name of Python or Perl job script gt e Windows command script qsub lt name of job script gt e Windows Python script qsub S PBS_EXEC bin pbs_python exe lt name of python job script gt If the path contains any spaces it must be quoted for example qsub S PBS_EXEC bin pbs python exe lt name of python job script gt 2 3 3 1 Specifying the Job s Top Shell You can can specify the path and name of the shell to use as the top shell for your job The rules for specifying the top shell are different for UNIX Linux and Windows do not skip the following subsections numbered 2 3 3 1 i and 2 3 3 1 ii The Shell_Path_List job attribute specifies the top shell the default is your login shell on the execution host You can set this attribute using the the following e The S lt path list gt option to qsub e The PBS Shell Path _List lt path list gt PBS directive The option argument path list has this form path hostf path host UG 18 PBS Prof
23. either IP or US mode PBS manages InfiniBand or the HPS LoadLeveler is not required in order to use InfiniBand switches in User Space mode PBS can track the resources for MPI LAPI programs or a mix of MPI and LAPI programs Any job that can run under IBM poe can run under PBS There are some exceptions and dif ferences under PBS the poe command is slightly different See section 5 2 5 5 poe Options and Environment Variables on page 107 5 2 5 1 Using the InfiniBand Switch To ensure that a job uses the InfiniBand switch make sure that the job s environment has PBS_GET_IBWINS set to 1 This can be accomplished the following ways e The administrator sets this value for all jobs e You can set the environment variable for each job set PBS_GET_IBWINS 1 in your shell environment and use the V option to every qsub command See the previous sec tion e csh setenv PBS GET IBWINS 1 e bash PBS GET IBWINS 1 export PBS GET IBWINS e You can set the environment variable for one job use the v PBS_GET IBWINS 1 option to the qsub command PBS Professional 13 0 Beta User s Guide UG 105 Chapter 5 Multiprocessor Jobs 5 2 5 2 Using the HPS If an HPS is available on the AIX machine where your job runs PBS runs your jobs so that they use the HPS In order to make sure that your job runs on this machine you can request the resource repre senting the HPS We recommend that this resource is called hps
24. hostB which conflicts with the np 3 specification inmpirun since only two MPD daemons are started The correct way is to specify either of the following PBS 1 select 1 ncpus 1 host hostAt2 ncpus 1 host hostB PBS l select 1 ncpus 1 host hostAt1 ncpus 2 host hostB mpiprocs 2 which causes the node file to contain hostA hostB hostB and is consistent with mpirun np 3 5 2 6 2 Options to Integrated Intel MPI If executed inside a PBS job script all of the options to the PBS interface are the same as for Intel MPI s mpirun except for the following host ghost For specifying the execution host to run on Ignored machinefile lt file gt The file argument contents are ignored and replaced by the contents of PBS_NODEFILE mpdboot option totalnum Ignored and replaced by the number of unique entries in PBS_NODEFILE PBS Professional 13 0 Beta User s Guide UG 113 Chapter 5 Multiprocessor Jobs mpdboot option file Ignored and replaced by the name of PBS_NODEFILE The argument to this option is replaced by P BS_NODEFILE Argument tompdboot option f lt mpd_hosts_ file gt replaced by PBS_NODEFILE S If the PBS interface to Intel MPI s mpirun is called inside a PBS job Intel MPI s mpirun s argument to mpdboot is not supported as this closely matches the mpirun option s lt spec gt You can simply run a separate mpdboot s before calling mpirun A warning message is issued by the PB
25. on a finished job j3 that has been purged from the historical records PBS rejects j1 as if the job no longer exists 6 2 5 4 Error Reporting PBS checks for errors in the existence state or condition of the job after accepting the job If there is an error PBS sends you mail about the error and deletes the job UG 148 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs Chapter 6 6 3 Adjusting Job Running Time This feature was added in PBS Professional 12 0 6 3 1 Shrink to fit Jobs PBS allows you to submit a job whose running time can be adjusted to fit into an available scheduling slot The job s minimum and maximum running time are specified in the min_walltime and max_walltime resources PBS chooses the actual walltime Any job that requests min_walltime is a shrink to fit job 6 3 1 1 Requirements for a Shrink to fit Job A job must have a value for min_walltime to be a shrink to fit job Shrink to fit jobs are not required to request max_walltime but it is an error to request max_walltime and not min_walltime Jobs that do not have values for min_walltime are not shrink to fit jobs and you can specify their walltime 6 3 1 2 Comparison Between Shrink to fit and Non shrink to fit Jobs The only difference between a shrink to fit and a non shrink to fit job is how the job s wall time is treated PBS sets the walltime when it runs the job Any walltime value that exists before the job runs is ig
26. on page 156 Running Your Job Interactively on page 165 X Y Z Submitting a Job Array on page 196 join Merging Output and Error Files on page 52 keep Keeping Output and Error Files on Execution Host on page 52 resource list Requesting Resources on page 59 M user _list Setting Email Recipient List on page 29 m MailOptions Specifying Email Notification on page 27 N name Specifying Job Name on page 30 0o path Paths for Output and Error Files on page 50 p priority Setting Your Job s Priority on page 165 P project Specifying a Job s Project on page 30 q destination Specifying Server and or Queue on page 32 r value Allowing Your Job to be Re run on page 161 S path_list Specifying the Job s Top Shell on page 18 u user_list Specifying Job Username on page 31 V Exporting All Environment Variables on page 171 v variable list Exporting Specific Environment Variables on page 171 W lt attribute gt lt value gt Setting Job Attributes on page 14 W block opt Making qsub Wait Until Job Ends on page 163 W depend list Using Job Dependencies on page 146 UG 26 PBS Professional 13 0 Beta User s Guide Submitting a PBS Job Chapter 2 Table 2 1 Options to the qsub Command Option Function and Page Reference W group _lis
27. tmp jobconf echo n 2 host host3 prog2 gt gt tmp jobconf mpiexec configfile tmp jobconf rm tmp jobconft Run job script qsub 1 select 3 ncpus 2 mpiprocs 2 job script lt job id gt 5 2 13 4 Restrictions The maximum number of ranks that can be launched under MVAPICH2 is the number of entries in P gt BS_ NODEFILE PBS Professional 13 0 Beta User s Guide UG 135 Chapter 5 Multiprocessor Jobs 5 2 14 Open MPI with PBS Open MPI can be integrated with PBS on UNIX and Linux so that PBS can track resource usage signal processes and perform accounting for all job processes Your PBS administra tor can integrate Open MPI with PBS 5 2 14 1 Using Open MPI with PBS You can run jobs under PBS using Open MPI without making any changes to your MPI com mand line 5 2 15 Platform MPI with PBS Platform MPI can be integrated with PBS on UNIX and Linux so that PBS can track resource usage signal processes and perform accounting for all job processes Your PBS administra tor can integrate Platform MPI with PBS 5 2 15 1 Using Platform MPI with PBS You can run jobs under PBS using Platform MPI without making any changes to your MPI command line 5 2 15 2 Setting up Your Environment In order to override the default rsh set PBS_RSHCOMMAND in your job script export PBS_RSHCOMMAND lt rsh_cmd gt 5 2 16 SGI MPT with PBS PBS supplies its own mpiexec to use with SGI MPT on the Altix running supported ver sions o
28. you use it as you would outside of PBS Some of the integrated MPIs have slightly different command lines See the instructions for each MPI The following table lists the supported MPIs and gives links to instructions for using each MPI Table 5 1 List of Supported MPls MPI Name Versions Instructions for Use HP MPI 1 08 03 See section 5 2 4 HP MPI with PBS on page 2 0 0 104 IBM POE AIX 5 x 6 x See section 5 2 5 IBM POE with PBS on page 105 Intel MPI 2 0 022 See section 5 2 6 Intel MPI with PBS on 3 page 112 4 LAM MPI 6 5 9 Deprecated See section 5 2 7 2 Using LAM 6 5 9 with PBS on page 117 LAM MPI 7 0 6 See section 5 2 7 1 Using LAM 7 x with 71 1 PBS on page 117 UG 100 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 Table 5 1 List of Supported MPIs MPI Name Versions Instructions for Use MPICH P4 1 2 5 See section 5 2 8 MPICH P4 with PBS on 1 2 6 page 118 1 2 7 MPICH GM See section 5 2 9 MPICH GM with PBS on page 120 MPICH MX See section 5 2 10 MPICH MX with PBS on page 123 MPICH2 1 0 3 See section 5 2 11 MPICH2 with PBS on 1 0 5 page 127 1 0 7 MVAPICH 1 2 See section 5 2 12 MVAPICH with PBS on page 131 MVAPICH2 1 8 See section 5 2 13 MVAPICH2 with PBS on page 133 Open MPI 1 4 x See section 5 2 14
29. 1 fserver True 15 ncpus 1 fserver False l place scatter Allocate 4 vnodes each with 6 CPUs with 3 MPI processes per vnode with each vnode UG 90 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 on a separate host The memory allocated would be one fourth of the memory specified by the queue or server default if one existed This results in a different placement of the job from version 5 4 l1nodes 4 ppn 3 ncpus 2 is converted to l select 4 ncpus 6 mpiprocs 3 l place scatter 10 Allocate 4 vnodes from 4 separate hosts with the property blue The amount of memory allocated from each vnode is 2560MB 10GB 4 rather than 10GB from each vnode lnodes 4 blue ncpus 2 1 mem 10GB is converted to l select 4 blue True ncpus 2 mem 2560mb lplace scatter 4 8 4 Caveats for Using Old Syntax 4 8 4 1 Changes in Behavior Most jobs submitted with 1nodes will continue to work as expected These jobs will be automatically converted to the new syntax However job tasks may execute in an unexpected order because vnodes may be assigned in a different order Jobs submitted with old syntax that ran successfully on versions of PBS Professional prior to 8 0 can fail because a limit that was per chunk is now job wide Example 4 16 A job submitted using Lnodes X 1lmem mM that fails because the mem limit is now job wide If the following conditions are true e PBS Professional 9 0 o
30. 30 PBS Professional 13 0 Beta User s Guide Submitting a PBS Job Chapter 2 2 5 4 Specifying Job Username By default PBS runs your job under the username with which you log in You may need to run your job under a different username depending on which PBS server runs the job You can specify a list of user names under which the job can run All but one of the entries in the list must specify the PBS server hostname as well so that PBS can choose which username to use by looking at the hostname You can include one entry in the list that does not specify a hostname PBS uses this in the case where the job was sent to a server that is not in your list The list of user names is stored in the User_List job attribute The value of this attribute defaults to the user name under which you logged in There is no limit to the length of the attribute List entries are in the following format username hostname username hostname username You can set the value of User_List in the following ways e Youcanuseqsub u lt username gt e You can use a directive PBS User_List lt username list gt Example 2 9 Our user is UserS on the submission host HostS UserA on server ServerA and UserB on server ServerB and is UserC everywhere else Note that this user must be UserA on all ExecutionA and UserB on all ExecutionB machines Then our user can use qsub u UserA ServerA UserB ServerB UserC for the job The job owner will always be UserS
31. 5 1 2 Setting Email Recipient List The list of recipients to whom PBS sends mail is specified in the Mail_Users job attribute You can set the Mail_Points attribute using the following methods e The m lt mail recipients gt option to qsub e The PBS Mail _Users lt mail recipients gt PBS directive The mail recipients argument is a list of user names with optional hostnames in this format user host user host For example qsub M userl mydomain com my_job When you set this option for a job array PBS sets the option for each subjob and sends mail for each subjob 2 5 1 3 Restricting Number of Job Deletion Emails By default when you delete a job or subjob PBS sends you email You can use qdel Wsuppress_email lt limit gt to restrict the number of emails sent to you each time you use qdel This option behaves as follows limit gt 1 You receive at most limit emails limit 0 PBS ignores this option limit 1 You receive no emails 2 5 1 4 Windows Caveats for Email PBS on Windows can send email only to addresses that specify an actual hostname that accepts port 25 sendmail requests For example if you use the following on Windows qsub M userl host mydomain com PBS Professional 13 0 Beta User s Guide UG 29 Chapter 2 Submitting a PBS Job The host named host mydomain com must accept port 25 connections 2 5 2 Specifying Job Name If you submit a job using a script without specifying a name for
32. 5 8 If you want four MPI processes where each process has its own CPU lselect 4 ncpus 1 See Built in Resources on page 307 of the PBS Professional Reference Guide for a defini tions of the mpiprocs resource 5 1 3 1 Chunks With No MPI Processes If you request a chunk that has no MPI processes PBS may take that chunk from a vnode which has already supplied another chunk You request a chunk that has no MPI processes using either of the following lselect 1 ncpus 0 lselect 1 ncpus 2 mpiprocs 0 5 1 4 Caveats and Advice for Multiprocessor Jobs 5 1 4 1 Requesting Uniform Processors Some MPI jobs require the work on all vnodes to be at the same stage before moving to the next stage For these applications the work can proceed only at the pace of the slowest vnode because faster vnodes must wait while it catches up In this case you may find it use ful to ensure that the job s vnodes are homogeneous If there is a resource that identifies the architecture type or speed of the vnodes you can use it to ensure that all chunks are taken from vnodes with the same value You can either request a specific value for this resource for all chunks or you can group vnodes according to the value of the resource See section 4 7 1 3 Grouping on a Resource on page 80 Example 5 9 The resource that identifies the speed is named speed and your job requests 16 chunks each with two CPUs two MPI processes all with speed equa
33. 52 south userl workq my job 1 0 10 Q qstat u userl barry Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 51 south barry workq airfoil 930 1 0 13 R 0 01 52 south userl workq my job 1 0 10 Q 54 south barry workq airfoil 1 0 13Q 10 1 6 Listing Running Jobs The r option to qstat displays the status of all running jobs at the optionally specified PBS server Running jobs include those that are running and suspended One line of output is generated for each job reported and the information is presented in the alternative display For example qstat r host1 Req d Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 43 host1 userl workq STDIN 4693 1 1 R 00 00 UG 232 PBS Professional 13 0 Beta User s Guide Checking Job amp System Status Chapter 10 10 1 7 Listing Non Running Jobs The i option to qstat displays the status of all non running jobs at the optionally spec ified PBS server Non running jobs include those that are queued held and waiting One line of output is generated for each job reported and the information is presented in the alternative display see description above For example qstat i host1 Req d Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time Time 44 hostl userl workq STDIN 1 1 0 10 1 8 Listing Hosts Assigned to Jobs The n option to qstat displays the hosts
34. 78 WwW waiting for job completion UG 163 Widgets UG 290 Windows password UG 23 X xpbs UG 299 UG 303 UG 305 UG 306 buttons UG 298 configuration UG 303 job dependencies UG 148 usage UG 220 UG 221 UG 247 PBS Professional 13 0 Beta User s Guide UG Index UG 289 xpbsre UG 302 PBS Professional 13 0 Beta User s Guide UG 311 UG Index UG 312 PBS Professional 13 0 Beta User s Guide
35. 8 Job Arrays Example of a stagein qsub W stagein foo array_index host 1 C WINNT Temp foo array_index J 1 5 stage_script Example of a stageout qsub W stageut C WINNT Temp foo array_index host 1 0Q my_username foo array_index out J 1 5 stage_script 8 4 3 3 Job Array File Staging Caveats We recommend using an absolute pathname for the storage path Remember that the path to your home directory may be different on each machine and that when using sandbox PRIVATE you may or may not have a home directory on all execution machines UG 198 PBS Professional 13 0 Beta User s Guide Job Arrays Chapter 8 8 4 3 4 Examples of Staging for Job Arrays Example 8 4 Simple example storage path store film Data files used as input framel frame2 frame3 execution _ path pix Executable a out For this example a out produces frame2 out from frame2 PBS W stagein pix in frame array index store film frame array_index PBS W stageout pix out frame array index out store film frame array_index out PBS J 1 3 a out frame PBS ARRAY INDEX in out Note that the stageout statement is all one line broken here for readability The result is that your directory named film contains the original files framel frame2 frame3 plus the new files framel out frame2 out and frame3 out Example 8 5 In this example we have a script named ArrayScript which calls scriptlet and scriptlet2
36. CPUs and 50GB of memory on a host named zooland The value of place depends on the default which defaults to place free l select 1 ncpus 2 mem 50gb host zooland This will allocate 1 CPU and 6GB of memory and one host locked swlicense from each of two hosts l select 2 ncpus 1 mem 6gb swlicense 1 lplace scatter Request free placement of 10 CPUs across hosts 1 select 10 ncpus 1 l place free Here is an odd sized job that will fit on a single Altix but not on any one node board We request an odd number of CPUs that are not shared so they must be rounded up l select 1 ncpus 3 mem 6gb l place pack excl Here is an odd sized job that will fit on a single Altix but not on any one node board We are asking for small number of CPUs but a large amount of memory l select 1 ncpus 1 mem 25gb l place pack excl Here is a job that may be run across multiple Altix systems packed into the fewest vnodes l select 2 ncpus 10 mem 12gb l place free Submit a job that must be run across multiple Altix systems packed into the fewest vnodes l select 2 ncpus 10 mem 12gb l place scatter Request free placement across nodeboards within a single host l select 1 ncpus 10 mem 10gb l place group host UG 84 PBS Professional 13 0 Beta User s Guide 20 21 22 23 Allocating Resources amp Placing Jobs Chapter 4 Request free placement across vnodes on multiple Altixes l select 1
37. Exiting The last column gives the status of the server itself active idle or scheduling qstat B Server Max Tot Que Run Hld Wat Trn Ext Status fast domain 0 14 13 1 0 0 0 0 Active PBS Professional 13 0 Beta User s Guide UG 241 Chapter 10 Checking Job amp System Status 10 2 2 Viewing Server Information in Long Format When querying jobs servers or queues you can add the option to qstat to change the display to the full or Jong display For example the server status shown above would be expanded using f as shown below qstat Bf Server fast mydomain com server state Active scheduling True total _jobs 14 state_count Transit 0 Queued 13 Held 0 Waiting 0 Running 1 Exiting 0 managers userl fast mydomain com default queue workq log events 511 mail from adm query other jobs True resources available mem 64mb resources available ncpus 2 resources default ncpus 1 resources _assigned ncpus 1 resources _assigned nodect 1 scheduler iteration 600 pbs_version PBSPro 13 0 Beta 41640 10 3 Checking Queue Status To view queue information in default format qstat Q destination To view queue information in alternate format qstat q G M destination To view queue information in long format qstat Q f destination UG 242 PBS Professional 13 0 Beta User s Guide Checking Job amp System Status Chapter 10 If you specify a destination id
38. If you want to run a job that needs to use the resources on internal login nodes only you can specify vntype cray_login in your select statement For example qsub lselect 4 ncpus 1 vntype cray_login job PBS Professional 13 0 Beta User s Guide UG 265 Chapter 11 Submitting Cray Jobs 11 5 6 Using Compute Nodes If your job script contains an aprun launch you must run your job on compute nodes To run your job on compute nodes specify a vntype of cray_compute For example lselect 2 ncpus 2 vntype cray_compute 11 5 7 Using Login and Compute Nodes You can request both login and compute nodes for your job You must specify the login node s before the compute nodes You can specify a vntype of cray_login for the chunks requiring login nodes and a vntype of cray_compute for the chunks requiring compute nodes For example qsub lselect 1 ncpus 2 vntype cray_login 2 ncpus 2 vntype cray_compute 11 5 8 Requesting Specific Groups of Nodes You can use select and place to request the groups of vnodes you want This replaces the behavior provided by mppnodes You may need to group nodes by some criteria for example e Certain nodes are fast nodes e Certain nodes share a required or useful characteristic e Some combination of nodes gives the best performance for an application Your administrator can set up either of the following e Custom Boolean resources on each vnode which reflect how the nodes are labeled and allow you
39. MVAPICHI1 UG 131 examples UG 132 MPI LAPI UG 105 MPICH UG 118 MPICH_GM rsh ssh examples UG 122 PBS Professional 13 0 Beta User s Guide UG Index MPICH2 examples UG 130 UG 134 MPICH GM MPD examples UG 121 MPICH MX MPD examples UG 124 rsh ssh examples UG 126 MPI OpenMP UG 141 MVAPICH1 UG 131 examples UG 132 N naccelerators UG 252 nchunk UG 253 O OpenMP UG 140 P pack UG 78 Parallel Virtual Machine PVM UG 138 password single signon UG 23 Windows UG 23 PBS Environmental Variables UG 195 PBS_ARRAY_ID UG 195 PBS ARRAY INDEX UG 195 PBS_ DEFAULT SERVER UG 303 PBS_JOBID UG 195 pbs_password UG 23 PBScrayhost UG 254 PBScraylabel UG 254 PBScraynid UG 254 PBScrayorder UG 255 per CPU node locked licenses UG 64 POE UG 105 poe examples UG 110 PBS Professional 13 0 Beta User s Guide procs UG 108 Prologues and Epilogues job arrays UG 195 provision UG 279 provisioned vnode UG 279 provisioning UG 280 allowing time UG 284 and commands UG 283 AOE restrictions UG 281 host restrictions UG 281 requesting UG 283 using AOE UG 280 vnodes UG 280 PVM Parallel Virtual Machine UG 138 Q qalter UG 299 qdel UG 299 qdisable UG 299 qenable UG 299 qhold UG 160 UG 299 qmove UG 222 UG 299 qmsg UG 219 UG 299 qorder UG 221 UG 222 UG 299 qrerun UG 299 qrls UG 160 UG 299 qrun UG 299 qselect UG 245 UG 246 qsig UG 299 qstart UG 299 qstat
40. Output Files 3 2 7 Summary of the Job s Lifecycle This is a summary of the steps performed by PBS The steps are not necessarily performed in this order e On each execution host if specified PBS creates a job specific staging and execution directory e PBS sets PBS_JOBDIR and the job s jobdir attribute to the path of the job s staging and execution directory e On each execution host allocated to the job PBS creates a job specific temporary direc tory e PBS sets the TMPDIR environment variable to the pathname of the temporary directory e If any errors occur during directory creation or the setting of variables the job is requeued e PBS stages in any files or directories e The prologue is run on the primary execution host with its current working directory set to PBS_HOME mom_priv and with PBS_JOBDIR and TMPDIR set in its environ ment e The job is run as you on the primary execution host e The job s associated tasks are run as you on the execution host s e The epilogue is run on the primary execution host with its current working directory set to the path of the job s staging and execution directory and with PBS_JOBDIR and TMPDIR set in its environment e PBS stages out any files or directories e PBS removes any staged files or directories e PBS removes any job specific staging and execution directories and their contents and all TMPDIRs and their contents e PBS writes the final job accountin
41. PBS V cd PBS_O WORKDIR echo conf pvm PBS NODEFILE echo quit pvm my_pvm_ program echo halt pvm Example 5 42 Sample PBS script for a PVM job PBS N pvmjob pvmexec a out inputfile data_in PBS Professional 13 0 Beta User s Guide UG 139 Chapter 5 Multiprocessor Jobs 5 4 Using OpenMP with PBS PBS Professional supports OpenMP applications by setting the OMP_NUM_THREADS variable in the job s environment based on the resource request of the job The OpenMP run time picks up the value of OMP_NUM_THREADS and creates threads appropriately MoM sets the value of OMP_NUM_THREADS based on the first chunk of the select statement If you request ompthreads in the first chunk MoM sets the environment variable to the value of ompthreads If you do not request ompthreads in the first chunk then OMP_NUM_THREADS is set to the value of the ncpus resource of that chunk If you do not request either ncpus or ompthreads for the first chunk of the select statement then OMP_NUM_THREADS is set to 7 You cannot directly set the value of the OMP_NUM_THREADS environment variable MoM will override any setting you attempt See Built in Resources on page 307 of the PBS Professional Reference Guide for a defini tion of the ompthreads resource Example 5 43 Submit an OpenMP job as a single chunk for a two CPU two thread job requiring 10gb of memory qsub 1 select 1 ncpus 2 mem 10gb Example 5 44 Run an MPI application with 64 MPI pro
42. Specified J Subjobs Subjob qalter Array object erroneous erroneous qdel Array object amp Run Running subjobs in speci Specified subjob ning subjobs fied range qhold Array object amp erroneous erroneous Queued subjobs qmove Array object amp erroneous erroneous Queued subjobs qmsg erroneous erroneous erroneous PBS Professional 13 0 Beta User s Guide UG 207 Chapter 8 Job Arrays Table 8 5 Using PBS Commands with Job Arrays Argument to Command Array Range Array Index Comman Array Arra i d a y Specified Range of Specified J Subjobs Subjob qorder Array object erroneous erroneous qrerun Running and finished Running subjobs in speci Specified subjob subjobs fied range qris Array object amp erroneous erroneous Queued subjobs qsig Running subjobs Running subjobs in speci Specified subjob fied range qstat Array object Specified range of subjobs Specified subjob tracejob erroneous erroneous Specified subjob 8 6 1 Deleting a Job Array The qdel command will take a job array identifier subjob identifier or job array range The indicated object s are deleted including any currently running subjobs Running subjobs are treated like running jobs Subjobs not running are deleted and never run By default one email is sent per deleted subjob so deleting a job array of 5000 subjobs results in 5000 emails being sent unless you are suppressing the number of emails sent See
43. Submitting a PBS Job Chapter 2 2 2 3 4 i Changing the Directive Prefix By default the text string PBS is used by PBS to determine which lines in the job file are PBS directives The leading symbol was chosen because it is a comment delimiter to all shell scripting languages in common use on UNIX systems Because directives look like com ments the scripting language ignores them Under Windows however the command interpreter does not recognize the symbol as a comment and will generate a benign non fatal warning when it encounters each PBS string While it does not cause a problem for the batch job it can be annoying or disconcerting to you If you use Windows you may wish to specify a different PBS directive via either the PBS_DPREFIX environment variable or the C option to qsub The qsub option over rides the environment variable For example we can direct PBS to use the string REM PBS instead of PBS and use this directive string in our job script REM PBS 1 walltime 1 00 00 REM PBS 1 select mem 400mb REM PBS j oe date t my_application date t Given the above job script we can submit it to PBS in one of two ways set PBS_DPREFIX REM PBS qsub my_job script or qsub C REM PBS my job script 2 2 3 4 ii Caveats and Restrictions for PBS Directives e You cannot use PBS_DPREFIX as the directive prefix e The limit on the length of a directive stri
44. Substates on page 422 of the PBS Professional Reference Guide for a list of job substates UG 280 PBS Professional 13 0 Beta User s Guide Using Provisioning Chapter 12 The following table shows how provisioning events affect job states and substates Table 12 1 Provisioning Events and Job States Substates Event Initial Job State Substate Resulting Job State Substate Job submitted Queued and ready for selection Provisioning starts Queued Queued Running Provisioning Provisioning fails to start Queued Queued Held Held Provisioning fails Running Provisioning Queued Queued Provisioning succeeds and job runs Running Provisioning Running Running Internal error occurs Running Provisioning Held Held 12 3 Requirements and Restrictions 12 3 1 12 3 1 1 PBS will provision only single vnode hosts Do not attempt to use provisioning on hosts that have more than one vnode 12 3 1 2 Host Restrictions Single vnode Hosts Only Server Host Cannot Be Provisioned The server host cannot be provisioned a MoM can run on the server host but that MoM s vnode cannot be provisioned The provision_enable vnode attribute resources_available aoe and current_aoe cannot be set on the server host 12 3 2 Only one AOE can be instantiated at a time on a vnode AOE Restrictions PBS Professional 13 0 Beta User s Guide UG 281
45. This attribute has value 57 when files are staging out PBS Professional 13 0 Beta User s Guide UG 163 Chapter 6 Controlling How Your Job Runs 6 9 Deferring Execution Normally PBS runs your job as soon as an appropriate slot opens up Instead you can spec ify a time after which the job is eligible to run The job is in the wait W state from the time it is submitted until the time it is eligible for execution 6 9 1 Syntax for Deferring Execution Use the a date time option to qsub to specify the time after which the job is eligible for execution The date_time argument is in the form CC YY MM DD hhmm SS where CC is the first two digits of the year the century optional YY is the second two digits of the year optional MM is the two digits for the month optional DD is the day of the month optional hh is the hour mm is the minute SS is the seconds optional If the day DD is in the future and the month MM is not specified the month defaults to the current month If the day DD is in the past and the month MM is not specified the month is set to next month For example if today is the 10th and you specify the 12th but no month your job is eligible to run two days from today on the 12th Similarly if the time hhmm is in the future and the day DD is not specified the day defaults to the current day If the time hmm is in the past and the day DD is not specified the day is set to tomorrow For example i
46. allocated to any running job at the optionally specified PBS server in addition to the other information presented in the alternative display The host information is printed immediately below the job see job 51 in the example below and includes the host name and number of virtual processors assigned to the job i e south 0 where south is the host name followed by the virtual processor s assigned A text string of is printed for non running jobs Notice the differences between the queued and running jobs in the example below qstat n Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 16 south userl workq aimsl4 1 0 01H 18 south userl workq aimsl4 1 0 01W 51 south barry workg airfoil 930 1 0 13 R 0 01 south 0 52 south userl workq my job 1 0 10Q 10 1 9 Displaying Job Comments The s option to qstat displays the job comments in addition to the other information presented in the alternative display The job comment is printed immediately below the job By default the job comment is updated by the Scheduler with the reason why a given job is PBS Professional 13 0 Beta User s Guide UG 233 Chapter 10 Checking Job amp System Status not running or when the job began executing A text string of is printed for jobs whose comment has not yet been set The example below illustrates the different type of messages that may be displayed q
47. an entry in user s HOMEDIR rhosts file Note that pos password encrypts the password obtained from you before sending it to the PBS server The pbs_ password command does not change your password on the current host only the password that is cached in PBS The pbs_password command is supported only on Windows and all supported Linux plat forms on x86 and x86_64 The pbs_ password command has no effect on running jobs Queued jobs use the new password 2 3 6 1 ii Per job Password Method If you are running in a password protected Windows environment but the single signon method has not been configured at your site then you will need to supply a password with the submission of each job You can do this via the qsub command with the Wpwd option and supply the password when prompted qsub Wpwd lt job script gt You will be prompted for the password which is passed on to the program then encrypted and saved securely for use by the job The password should be enclosed in double quotes Keep in mind that in a multi host job the password supplied will be propagated to all the sis ter hosts This requires that the password be the same on your accounts on all the hosts The use of domain accounts for a multi host job will be ideal in this case Accessing network share drives resources within a job session also requires that you submit the job with a password viaqsub W pwd UG 24 PBS Professional 13 0 Beta User s Guide S
48. and run ning jobs as long as the job history is still being stored by PBS The x option to the qstat command allows you to see information for all jobs whether they are running queued finished or moved This information is presented in standard for mat You can view the history for selected sets of jobs UNIX Linux qstat fx qselect x s MF Windows for F usebackq j in Program Files PBSPro exec bin qselect x s MF do Program Files PBS Pro exec bin qstat fx j 10 1 15 1 Getting Information on Jobs Moved to Another Server If your job is running at another server you can use the qgstat command to see its status If your site is using peer scheduling your job may be moved to a server that is not your default server In this case to see information on your job you can use any of the following methods e Use qstat x to see information about all jobs whether running queued finished or moved you can specify job IDs e Give the job ID as an argument to qstat If you use only qstat your job will not appear to exist For example you submit a job to ServerA and it returns the job ID as 123 ServerA Then 123 ServerA is moved to ServerB In this case use qstat 123 or qstat 123 ServerA to get information about your job ServerA will query ServerB for the information To list all jobs at ServerB you can use qstat ServerB If you use qstat without the job ID the job will
49. behavior is controlled by the job s Keep_Files attribute You can set this attribute to one of the following values e PBS keeps stderr in the job s staging and execution directory on the primary exe cution host 0 PBS keeps stdout in the job s staging and execution directory on the primary exe cution host UG 52 PBS Professional 13 0 Beta User s Guide Job Input amp Output Files Chapter 3 e 0 0e PBS keeps both standard output and standard error on the primary execution host in the job s staging and execution directory PBS does not keep either file on the execution host The default value for Keep_Files is n You can set the value of the Keep_Files job attribute using the following methods e Useqsub k lt keep option gt e Use PBS Keep Files lt keep option gt For example you can use either of the following to keep both standard output and standard error on the execution host qsub k oe my job PBS k oe 3 3 5 1 Caveats for Keeping Files on Execution Host e When a job finishes its job specific execution directory and all files in that directory are deleted If you specified that stdout and or stderr should be kept on the execution host any files you specified are deleted as well e Theqsub k option overrides the o and e options For example if you specify qsub k o o lt path gt stdout is kept on the execution host and is not copied to the path you specified 3 3 6 Changing UNIX L
50. but when an occurrence ends only its running jobs are deleted Each occurrence of a standing reservation has reserved resources which satisfy the resource request but each occurrence may have its resources drawn from a different source A query for the resources assigned to a standing reservation will return the resources assigned to the soonest occurrence shown in the resv_nodes attribute reported by pbs_rstat Soonest occurrence of a standing reservation The occurrence which is currently active or if none is active then it is the next occurrence PBS Professional 13 0 Beta User s Guide UG 173 Chapter 7 Reserving Resources Ahead of Time Degraded reservation An advance reservation for which one or more associated vnodes are unavailable A standing reservation for which one or more vnodes associated with any occurrence are unavailable 7 2 Prerequisites for Reserving Resources The time for which a reservation is requested is in the time zone at the submission host You must set the submission host s PBS_TZID environment variable The format for PBS_TZID isa timezone location Example America Los Angeles America Detroit Europe Berlin Asia Calcutta See section 2 4 5 Setting the Sub mission Host s Time Zone on page 18 7 3 Creating and Using Reservations 7 3 1 Introduction to Creating and Using Reservations You can create both advance and standing reservations using the pbs_rsub command PBS either confir
51. cannot have a shrink to fit reservation It is errone ous to set min_walltime or max_walltime for a reservation If attempted via pbs_rsub the following error is printed min_ walltime and max walltime are not valid resources for reservation It is erroneous to set resources_max or resources_min for min_walltime and max_walltime If attempted the following error message is displayed whichever is appro priate Resource limits can not be set for min _walltime Resource limits can not be set for max _walltime PBS Professional 13 0 Beta User s Guide UG 153 Chapter 6 Controlling How Your Job Runs 6 4 Using Checkpointing 6 4 1 Prerequisites for Checkpointing A job is checkpointable if it has not been marked as non checkpointable and any of the fol lowing is true e Its application supports checkpointing and your administrator has set up checkpoint scripts e There is a third party checkpointing application available e The OS supports checkpointing 6 4 2 Minimum Checkpoint Interval The execution queue in which the job resides controls the minimum interval at which a job can be checkpointed The interval is specified in CPU minutes or walltime minutes The same value is used for both so for example if the minimum interval is specified as 12 then a job using the queue s interval for CPU time is checkpointed every 12 minutes of CPU time and a job using the queue s interval for walltime is checkpoi
52. commands listed above cannot be used with finished jobs whether they finished at the local server or a remote server These jobs are no longer running PBS is storing their infor mation and this information cannot be altered Trying to use one of the above commands with a finished job results in the following error message lt command name gt Job lt job ID gt has finished UG 214 PBS Professional 13 0 Beta User s Guide Working with PBS Jobs Chapter 9 9 2 Modifying Job Attributes Most attributes can be changed by the owner of the job or a manager or operator while the job is still queued However once a job begins execution the only values that can be mod ified are cputime walltime and run_count These can only be reduced When the qalter 1 option is used to alter the resource list of a queued job it is important to understand the interactions between altering the select directive and job limits If the job was submitted with an explicit 1 select then vnode level resources must be qaltered using the 1 select form In this case a vnode level resource RES cannot be qaltered with the 1 lt resource gt form For example Submit the job qsub 1l select 1 ncpus 2 mem 512mb jobscript Job s ID is 230 qalter the job using 1 RES form qalter l ncpus 4 230 Error reported by qalter qalter Resource must only appear in select specification when select is used ncpus 230 qalter the job using the
53. composed of a header box the text box FILE entry box and two but tons labeled Joad and save If you have a script file containing PBS options and executable lines then type the name of the file on the FILE entry box and then click on the oad button Alternatively you may click on the FILE button which will display a File Selection browse window from which you may point and click to select the file you wish to open The File Selection Dialog window is shown below Clicking on the Se ect File button will load the file into xpbs just as does the oad button described above UG 300 PBS Professional 13 0 Beta User s Guide Using the xpbs GUI Chapter 14 The various fields in the Submit window will get loaded with values found in the script file The script file text box will only be loaded with executable lines non PBS found in the script The job script header box has a Prefix entry box that can be modified to specify the PBS directive to look for when parsing a script file for PBS options If you don t have a existing script file to load into xpbs you can start typing the executable lines of the job in the file text box Next review the Destination listbox This box shows the queues found in the host that you selected A special entry called host refers to the default queue at the indicated host Select appropriately the destination queue for the job Next define any required resources in the Resource List subwind
54. eee eee EEEE ERAN wun 76 4 7 Specifying Job Placement e bc Ac w daniedgdeculealadak nen hacmeid TI 4 8 Backward Compatibility o aaua aaau 86 PBS Professional 13 0 Beta User s Guide UG iii UG Contents 5 Multiprocessor Jobs 93 5 1 Submitting Multiprocessor Jobs 0 0000 c ee ee 93 5 2 Using MPI With PBS 32 2s40 eseri verte testi petehevsceeds Sees 100 5 3 Using PVM WIM PBS 1 3 grneriri agria alan wld Gee ise Bren dea 138 5 4 Using OpenMP With PBS i026 20eeae eee eee wae eee ewan d 140 5 5 Hybrid MPI OpenMP Jobs s 2 50 eu sean is peek ake Peed dak oe 141 6 Controlling How Your Job Runs 145 6 1 Using Job Exit Status eeri cece dwadd dda de eee he etmek ead 145 6 2 Using Job Dependencies 4 c2 02 2c osiidae set aouddeteeeosd diees 146 6 3 Adjusting Job Running Time 0 2 0 0 eee 149 6 4 Using CHECK DOINUING sa3 amp dines scale Ghai svete YAS de ara Bee 154 6 5 Holding and Releasing Jobs 2 202200 0 e040 ee eee eae 156 6 6 Allowing Your Job to be Rerum cc cscs ee Bede tes edeee wes 161 6 7 Controlling Number of Times Job is Re run 000 0000s 162 6 8 Making qsub Wait Until Job EmdS 2 2 oiaaiecdea ghee vas eeeaiows 163 6 9 Detemring EX CutlOts 1 2 2054 rede dmawaednaautega ned ade tninaes eam 164 6 10 Setting Your Job s Priority 2 52 20 oscide cee bo ie tet eosi dis ee 165 6 11 Running Your Job Interactively o ccc si sc vwtaweviavweeeews 165 6 12 Specifying Which Jobs to Preempt wi o00
55. escaped or enclosed in another set of quotes This second set of quotes must be different from the first set meaning that double quotes must be enclosed in single quotes and vice versa e Ifa string resource value contains spaces or shell metacharacters enclose the string in quotes or otherwise escape the space and metacharacters Be sure to use the correct quotes for your shell and the behavior you want 4 3 8 2 Warning About NOT Requesting walltime If your job does not request a walltime and there is no default for walltime your job is treated as if it had requested a very very long walltime Translation the scheduler will have a hard time finding a time slot for your job Remember the administrator may schedule dedicated time for the entire PBS complex once a year for upgrading etc In this case your job will never run We recommend requesting a reasonable walltime for your job 4 3 8 3 Caveats for Jobs Requesting Undefined Resources If you submit a job that requests a job wide or host level resource that is undefined the job is not rejected at submission instead it is aborted upon being enqueued in an execution queue if the resources are still undefined This preserves backward compatibility PBS Professional 13 0 Beta User s Guide UG 67 Chapter 4 Allocating Resources amp Placing Jobs 4 3 8 4 Matching Resource Requests with Unset Resources When job resource requests are being matched with available resources a n
56. example to specify that there will be recurrences on Tuesdays and Wednesdays at 9 a m and 11 a m use BYDAY TU WE BYHOUR 9 11 BYDAY should be used with FREQ WEEKLY BYHOUR should be used with FREQ DAILY or FREQ WEEKLY UG 176 PBS Professional 13 0 Beta User s Guide Reserving Resources Ahead of Time Chapter 7 until_spec Occurrences will start up to but not after this date and time This means that if occur rences last for an hour and normally start at 9 a m then a time of 9 05 a m on the day specified in the until_ spec means that an occurrence will start on that day Format YYYYMMDD THHMMSS Note that the year month day section is separated from the hour minute second sec tion by a capital T Default 3 years from time of reservation creation 7 3 3 1 Setting Reservation Start Time and Duration In a standing reservation the arguments to the R and E options to pbs_ rsub can provide more information than they do in an advance reservation In an advance reservation they provide the start and end time of the reservation In a standing reservation they can provide the start and end time but they can also be used to compute the duration and the offset from the interval start The difference between the values of the arguments for R and E is the duration of the reser vation For example if you specify R 0930 E 1145 the duration of your reservation will be two hours and fifteen minutes If you specify R 1
57. for the first time under UNIX you may need to configure your work station for it Depending on how PBS is installed at your site you may need to allow xpbs to be displayed on your workstation However if the PBS client commands are installed locally on your workstation you can skip this step Ask your PBS administrator if you are unsure The most secure method of running xpbs remotely and displaying it on your local XWin dows session is to redirect the X Windows traffic through ssh secure shell via setting the X11Forwarding yes parameter in the sshd_config file Your local system admin istrator can provide details on this process if needed PBS Professional 13 0 Beta User s Guide UG 289 Chapter 14 Using the xpbs GUI An alternative but less secure method is to direct your X Windows session to permit the xpbs client to connect to your local X server Do this by running the xhost command with the name of the host from which you will be running xpbs as shown in the example below xhost server mydomain com Next on the system from which you will be running xpbs set your X Windows DISPLAY variable to your local workstation For example if using the C shell setenv DISPLAY myWorkstation 0 0 However if you are using the Bourne or Korn shell type the following export DISPLAY myWorkstation 0 0 14 2 Using xpbs Definitions of Terms The various panels boxes and regions collectively called widgets of xpbs and h
58. i0ces sean cddawsgnbacws 172 7 Reserving Resources Ahead of Time 173 7 4 SIOSSALY cs amagg cava seen teeesl deo E E ET 173 T2 Prerequisites for Reserving Resources auuuaua aaua aaau 174 TS Creating and Using Reservations 0 0 00 eee eee eee 174 7 4 Viewing the Status of a Reservation 0 000 eee eee eee 180 T5 Using Your Reservati n eeicindsicotawecd ced Low dade rea 184 7 6 Reservation Caveats and Errors 3 2 5 5 0 c0 00 0b ee edna ecbaee be 187 8 Job Arrays 191 8 1 Advantages of Job AVtayS ia oc daa Seek cies an Gad eda Bea oes 191 8 2 GIOSSAY i cviacterdcgwsdiidwcrvase daad ewees reeds dean eae 191 8 3 Description of JOD Arrays 4 0 c22 0 i ies et obedd ee eeeokd diaeds 192 8 4 submitting a Job Array ccsi20vscuwsSerwstigwdetevacieda das 196 8 5 Viewing Status of a Job Array 6 ee 203 8 6 Using PBS Commands with Job Arrays 0 0000 eee eee 207 8 7 JOD Array CAavealSs osc datas errero PESE REEN SARS Eee eases 211 UG iv PBS Professional 13 0 Beta User s Guide UG Contents 9 Working with PBS Jobs 9 1 Current vs Historical Jobs 0 2020025 9 2 Modifying Job Attributes 2 220000000 9 3 Deleting JODS sist ancl sea aed ganda deal EE Gate este 9 4 Sending Messages to Jobs 0002020005 9 5 Sending Signals to Jobs 00000020 ee eee 9 6 Changing Order of Jobs 000020000055 9 7 Moving Jobs Between Queue S 2200
59. it is returned to that state unless its waiting time was reached in which case it goes to the Q state 8 6 6 Selecting Job Arrays The default behavior of gselect is to return the job array identifier without returning sub job identifiers The qselect command does not return any job arrays when the state selection s option restricts the set to R S T or U because a job array will never be in any of these states However you can use qselect to return a list of subjobs by using the t option PBS Professional 13 0 Beta User s Guide UG 209 Chapter 8 Job Arrays You can combine options to qselect For example to restrict the selection to subjobs use both the J and the T options To select only running subjobs use J T sR Table 8 6 Options to qselect for Job Arrays Option Selects Result none jobs Shows job and job array identifiers job arrays J job arrays Shows only job array identifiers T jobs Shows job and subjob identifiers subjobs 8 6 7 Ordering Job Arrays in the Queue The qorder command can only be used with job array objects not on subjobs or ranges This changes the queue order of the job array in association with other jobs or job arrays in the queue 8 6 8 Requeueing a Job Array The qrerun command will take a job array identifier subjob identifier or job array range If a job array identifier is given as an argument it is returned to its init
60. jobs Auto Update sets an automatic update of information every user specified number of minutes Track Job for periodically checking for returned output files of jobs Preferences for setting parameters such as the list of server host s to query Help contains some help information About gives general information about the xpbs GUI Close for exiting xpbs plus saving the current setup information PBS Professional 13 0 Beta User s Guide UG 291 Chapter 14 Using the xpbs GUI 03 08 05 17 56 23 usr local pbs lib xpbs pbs_bin xpbs_datadump t 30 u agu dhep115 done xpbs_datadump alarm ti 14 3 2 xpbs Hosts Panel The Hosts panel is composed of a leading horizontal HOSTS bar a listbox and a set of com mand buttons The HOSTS bar contains a minimize maximize button identified by a dot or a rectangular image for displaying or iconizing the Hosts region The listbox displays informa tion about favorite server host s and each entry is meant to be selected via a single left click shift left click for contiguous selection or control left click for non contiguous selection To the right of the Hosts Panel are buttons that represent actions that can be performed on selected host s Use of these buttons will be explained in detail below UG 292 PBS Professional 13 0 Beta User s Guide Using the xpbs GUI Chapter 14 detail Provides information about selected server host s This functionality can also be achieve
61. jobs only However you can tell qstat to display information for all jobs whether they are running queued finished or moved we cover this in this chapter Job history is kept for a period defined by your adminis trator Summary of usage qstat J p t x job_identifier destination qstat f J p t x job_identifier destination qstat a w H i r G M J n 1 w s 1 w t T w u user job_id destination qstat Q f destination qstat q G M destination qstat B f server_name qstat version PBS Professional 13 0 Beta User s Guide UG 225 Chapter 10 Checking Job amp System Status 10 1 1 Specifying Jobs to View You can specify that you want information for a job identifier a list of job identifiers or all of the jobs at a destination To specify a job identifier it must be in the following form sequence_number server_name server where sequence_number server_name is the job identifier assigned at submission If you do not specify server_name the default server is used If server is supplied the request will be for the job identifier currently at that server If you specify a destination identifier it takes one of the following three forms queue server queue server If you specify queue the request is for status of all jobs in that queue at the default server If you use the server f
62. level but not both The description of each resource tells you which way to use the resource see Resources on page 305 of the PBS Professional Ref erence Guide We will cover the details of requesting resources in section 4 3 2 Requesting Job wide Resources on page 61 and section 4 3 3 Requesting Resources in Chunks on page 61 4 3 1 Quick Summary of Requesting Resources Job wide resources are requested in lt resource neme gt lt value gt pairs You can request job wide resources using any of the following e Theqsub 1 lt resource name gt lt value gt option You can request multiple resources using either format l lt resource gt lt value gt lt resource gt lt value gt 1 lt resource gt lt value gt l lt resource gt lt value gt e One ormore PBS 1 lt resource name gt lt value gt directives Chunk resources are requested in chunk specifications in a select statement You can request chunk resources using any of the following e Theqsub 1 select N chunk specification N chunk spec ification option e A PBS 1 select N chunk specification N chunk specifi cation directive Format for requesting both job wide and chunk resources qsub non resource portion of job l lt resource gt lt value gt this is the job wide request l select lt chunk gt lt chunk gt this is the selection statement PBS supplies several commands that you can use to request res
63. machine with 2 processors PBS Professional 13 0 Beta User s Guide UG 203 Chapter 8 Job Arrays demoscript bin sh PBS N JobExample sleep 60 arrayscript bin sh PBS N ArrayExample PBS J 1 5 sleep 60 We run these scripts using qsub qsub arrayscript 1235 host qsub demoscript 1236 host UG 204 PBS Professional 13 0 Beta User s Guide Job Arrays Chapter 8 We query using various options to qstat qstat Job id Name User 1235 host ArrayExample userl 0 B workq 1236 host JobExample userl 0 Q workq qstat J Job id Name User Time Use S Queue 1235 host ArrayExample userl 0 B workq qstat p Job id Name User done S Queue 1235 host ArrayExample userl 0 B workq 1236 host JobExample userl Q workq qstat t Job id Name User Time Use S Queue 1235 host ArrayExample user1 0 B workq 00 00 00 R workq 00 00 00 R workq 1235 1 host ArrayExample user1 1235 2 host ArrayExample user1 1235 3 host ArrayExample userl 0 Q workq 1235 4 host ArrayExample userl 0 Q workq 1235 5 host ArrayExample userl 0 Q workq 1236 host JobExample userl 0 Q workq qstat Jt Job id Name User Time Use S Queue 1235 1 host ArrayExample userl 00 00 00 R workg 1235 2 host ArrayExample userl 00 00 00 R workg 1235 3 host ArrayExample user1 0 Q workq 1235 4 host ArrayExample user1 0 Q workq PBS Professional 13 0 Beta User s Guide UG 205 UG 206 Chapter 8 Job Arrays 1235 5 host ArrayExample userl 0 Q workq
64. more particular GPUs This allows you to run applications on the GPUs for which the applications are written Your administrator can set up a resource to allow jobs to request specific GPUs We recom mend that the GPU resource is called gpu_id When you request specific GPUs specify the GPU that you want for each chunk qsub l select gpu_id lt GPU ID gt lt rest of chunk specification gt Example 4 9 To request 4 vnodes each with GPU with ID 0 qsub lselect 4 ncpus 1 gpu_id gpu0 my _gpu_job UG 66 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 When a job is submitted requesting specific GPUs the PBS scheduler assigns the vnode with the resource containing that gpu_id to the job The application can use the appropriate CUDA call to bind the process to the allocated GPU 4 3 7 5 Viewing GPU Information for Nodes You can find the number of GPUs available and assigned on execution hosts via the pbsn odes command See section 4 6 Viewing Resources on page 76 4 3 8 Caveats and Restrictions on Requesting Resources 4 3 8 1 Caveats and Restrictions for Specifying Resource Values e Resource values which contain commas quotes plus signs equal signs colons or paren theses must be quoted to PBS The string must be enclosed in quotes so that the com mand e g qsub qalter will parse it correctly e When specifying resources via the command line any quoted strings must be
65. n 1 w s 1 w t T w u user_list job_identifier destination PBS Professional 13 0 Beta User s Guide UG 227 Chapter 10 Checking Job amp System Status The alternate display shows the following fields Job ID Job owner Queue in which job resides Job name Session ID only appears when job is running Number of chunks or vnodes requested Number of CPUs requested Amount of memory requested Amount of CPU time requested if CPU time requested if not amount of wall clock time requested State of job Amount of CPU time elapsed if CPU time requested if not amount of wall clock time elapsed qstat a Req d Elap Job ID User Queue Jobname Ses NDS TSK Mem Time S Time 16 south userl workq aimsl4 1 0 01H 18 south userl workq aimsl4 1 0 01W 51 south barry workq airfoil 930 1 0 13 R 0 01 52 south userl workq myjob 1 0 10Q 53 south susan workq tns3d 1 0 20Q 54 south barry workq airfoil 1 0 13Q9Q 55 south donald workq seq_35_ 1 2 00Q UG 228 PBS Professional 13 0 Beta User s Guide Checking Job amp System Status Chapter 10 10 1 3 1 Display Size in Gigabytes The G option to qstat displays all jobs at the requested or default server using the alter native display showing all size information in gigabytes GB rather than the default of small est displayable units Note that if the size specified is l
66. not authorized to signal the job e The job is not in the running state e The requested signal is not supported by the execution host e The job is exiting e The job is provisioning Two special signal names suspend and resume note all lower case are used to suspend and resume jobs When suspended a job continues to occupy system resources but is not exe cuting and is not charged for walltime Manager or operator privilege is required to suspend or resume a job UG 220 PBS Professional 13 0 Beta User s Guide Working with PBS Jobs Chapter 9 The signal TERM is useful because it is ignored by shells but you can trap it and do useful things such as write out status The three examples below all send a signal 9 SIGKILL to job 34 qsig s SIGKILL 34 qsig s KILL 34 If you want to trap the signal in your job script the signal must be trapped by all of the job s shells On most UNIX systems the command kill 1 that s minus ell will list all the available signals 9 5 1 Using xpbs to Signal a Job To send a signal to a job using xpbs first select the job s of interest then click the signal button Doing so will launch the Signal Running Job dialog box From this window you may click on any of the common signals or you may enter the signal number or signal name you wish to send to the job Click the Signal button to complete the process 9 6 Changing Order of Jobs PBS provides the qo
67. on an AIX machine You can run a job that requests large page memory in mandatory mode qsub export LDR_CNTRL LARGE PAGE _DATA M path to exe bigprog D You can run a job that requests large page memory in advisory mode qsub export LDR_CNIRL LARGE PAGE DATA Y path to exe bigprog D 13 2 Using Comprehensive System Accounting PBS support for CSA on SGI systems is no longer available The CSA functionality for SGI systems has been removed from PBS You can use CSA on Cray systems PBS Professional 13 0 Beta User s Guide UG 287 Chapter 13 Special Circumstances and Tools CSA provides accounting information about user jobs called user job accounting CSA works the same with and without PBS To run user job accounting either you must specify the file to which raw accounting information will be written or an environment vari able must be set The environment variable is ACCT _TMPDIR This is the directory where a temporary file of raw accounting data is written To run user job accounting you issue the CSA command ja lt filename gt or if the envi ronment variable ACCT_TMPDIR is set ja In order to have an accounting report pro duced you issue the command ja lt options gt where the options specify that a report will be written and what kind To end user job accounting you issue the command ja t the t option can be included in the previous set of options See the m
68. on the login node If you request more than one login node the job script runs on the first login node requested 11 4 4 Login Nodes in PBS Reservations If the jobs that are to run in a PBS reservation require a particular login node you must do the following e The reservation must request the specific login node e Each job that will run in the reservation must request the same login node that the reser vation requested 11 4 5 Specifying Number of Chunks You specify the number of chunks by prefixing each chunk request with an integer If not specified this integer defaults to 7 For example to specify 4 chunks with 2 CPUs each and 8 chunks with 1 CPU each qsub lselect 4 ncpus 2 8 ncpus 1 You cannot request the nchunk resource If you request fewer chunks the scheduling cycle is faster See section 11 7 9 Request Fewer Chunks on page 275 11 4 6 Requesting mppnppn Equivalent If your job requires the equivalent of mppnppn you can do either of the following e When using select and place statements use the translation information provided in Table 11 1 Mapping mpp Resources to select and place on page 257 and include lplace scatter in the job request e Include mppnppn in the qsub line mppnppn is deprecated PBS Professional 13 0 Beta User s Guide UG 261 Chapter 11 Submitting Cray Jobs 11 4 7 Do Not Mix mpp and select place Jobs cannot use both Impp syntax and lselect Iplace synta
69. option or environment variable should be set to the total number of mpiprocs requested by the job when using US mode If neither this option nor the MP_PROCS environment variable is set PBS uses the number of entries in PBS_NODEFILE If this option is set to N and the job is submitted with a total of M mpiprocs If N gt M The value N is passed to IBM poe If N lt M and US mode is not being used The value N is passed to poe If N lt M and US mode is being used US mode is turned off and a warning is printed pbsrun poe Warning user mode disabled due to MP PROCS setting UG 108 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 5 2 5 6 Caveats for POE 5 2 5 6 i Multi host Jobs on POE If you wish to run a multi host job it must not run on a mix of InfiniBand and non InfiniBand hosts It can run entirely on hosts that are non InfiniBand or on hosts that are all using InfiniBand but not both 5 2 5 6 ii Maximum Number of Ranks on POE The maximum number of ranks that can be launched under integrated POE is the number of entries in PBS_NODEFILE 5 2 5 6 iii Run Jobs in Foreground on POE Since PBS is tracking tasks started by poe these tasks are counted towards your run limits Running multiple poe jobs in the background will not work Instead run poe jobs one after the other or submit separate jobs Otherwise switch windows will be used by more than one task The tracejob command will show any of va
70. prints the default display with a column for Percentage Com pleted For a job array this is the number of subjobs completed and deleted divided by the total number of subjobs For example qstat p Job ID Name User done S Queue 44 host1 STDIN user1 40 B workq 10 1 12 Viewing Job Start Time There are two ways you can find the job s start time If the job is still running you can do a qstat f and look for the stime attribute If the job has finished you look in the account ing log for the S record for the job For an array job only the S record is available array jobs do not have a value for the stime attribute PBS Professional 13 0 Beta User s Guide UG 235 Chapter 10 Checking Job amp System Status 10 1 13 Viewing Estimated Start Times For Jobs You can view the estimated start times and vnodes of jobs using the qstat command If you use the T option to qstat when viewing job information the Elap Time field is replaced with the Est Start field Running jobs are shown above queued jobs If the estimated start time or vnode information is invisible to unprivileged users no esti mated start time or vnode information is available via qstat Example output qstat T Req d Req d Est Job ID Username Queue Jobname SessID NDS TSK Memory Time S Start 5 hostl userl workq foojob 12345 1 1 128mb 00 10 R 9 hostl userl workq foojob 1 1 128mb 00 10 Q 11 30 10 hostl userl workq foojob 1 1 128mb 00 10 Q Tu 15 7 ho
71. qsub l select lt consumable resource name gt lt required amount gt lt rest of chunk specification gt Example 4 4 The consumable resource named AppB indicates the number of available per use application licenses on a host To request a host with a per use node locked license for AppB where you ll run one instance of AppB on two CPUs in one chunk qsub 1 select 1 ncpus 2 AppB 1 4 3 5 2 iii Requesting Per CPU Node locked Application Licenses Per CPU node locked licenses are typically arranged so that the host has one license for each licensed CPU The PBS administrator configures a consumable numerical resource indicating the number of available licenses You must request one license for each CPU When requesting numerical per use node locked licenses request the required number of licenses for each host qsub l select lt per CPU resource name gt lt required amount gt lt rest of chunk specification gt Example 4 5 The numerical consumable resource named AppC indicates the number of available per CPU licenses To request a host with two per CPU node locked licenses for AppC where you ll run a job using two CPUs in one chunk qsub 1 select 1 ncpus 2 AppC 2 UG 64 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 4 3 6 Requesting Scratch Space Scratch space on a machine is configured as a host level dynamic resource Ask your admin istrator for the name of the scratc
72. replaced by the contents of PBS_NODEFILE np If not specified the number of entries found in the PBS_NODEFILE is used The maximum number of ranks that can be launched is the number of entries in PBS_NODEFILE Pg The use of the pg option for having multiple executables on multiple hosts is allowed but it is up to you to make sure only PBS hosts are specified in the process group file MPI processes spawned on non PBS hosts are not guaranteed to be under the control of PBS 5 2 10 2 ii Examples Example 5 32 Run a single executable MPICH MX job with 64 processes spread out across the PBS allocated hosts listed in BS_NODEFILE PBS_NODEFILE pbs host1 pbs host2 pbs host64 UG 126 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 qsub 1 select 64 ncpus 1 mpirun np 64 path myprog x 1200 D lt job id gt Example 5 33 Run an MPICH MxX job with multiple executables on multiple hosts listed in the process group file procgrp qsub 1 select 2 ncpus 1 echo pbs host1 1 username x y a exe argl arg2 gt procgrp echo pbs host2 1 username x x b exe argl arg2 gt gt procgrp mpirun pg procgrp path myprog x rm f procgrp D lt job id gt mpirun prints the warning message warning pg is allowed but it is up to user to make sure only PBS hosts are specified MPI processes spawned are not guaranteed to be under PBS control The warning is issued because if any of the host
73. resources Job is requeued Job continues to run 6 6 1 Caveats and Restrictions for Marking Jobs as Rerunnable e Interactive jobs are not rerunnable e Job arrays are required to be rerunnable PBS will not accept a job array that is marked as not rerunnable You can submit a job array without specifying whether it is rerunnable and PBS will automatically mark it as rerunnable 6 7 Controlling Number of Times Job is Re run PBS has a built in limit of 21 on the number of times it will try to run your job The number of attempts is tracked in the job s run_count attribute By default the value of run_count is zero at job submission The job is held when the value of run_count goes above 20 You can reduce the number of times PBS attempts to run your job You can specify a non negative value for run_count at job submission and you can use qalter to raise the value of run_count while the job is running You cannot give a job more retries than the limit and you cannot lower the value of run_count while the job is running 6 7 1 Caveats for Raising Value of run_count Attribute If your job is checkpointed and requeued enough times it will be held UG 162 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs Chapter 6 6 8 Making qsub Wait Until Job Ends Normally when you submit a job the qsub command exits after returning the ID of the new job You can use the W block true option to qsub to specify that
74. s of interest and then click the move button Doing so will launch the Move Job dialog box from which you can select the queue and or server to which you want the job s moved The qmove command can only be used with job array objects not with subjobs or ranges Job arrays can only be moved from one server to another if they are in the Q H or W states and only if there are no running subjobs The state of the job array object is preserved in the move The job array will run to completion on the new server As with jobs a qstat on the server from which the job array was moved will not show the job array A qstat on the job array object will be redirected to the new server The subjob accounting records will be split between the two servers PBS Professional 13 0 Beta User s Guide UG 223 Chapter 9 Working with PBS Jobs UG 224 PBS Professional 13 0 Beta User s Guide 10 Checking Job amp System Status 10 1 Viewing Job Status You can use the gstat command to view job information in the following formats e Basic format minimal summary of jobs e Alternate format intermediate listing of job information e Long format shows all information about jobs You can see only the information for which you have the required privilege We discuss each format in the following sections See qstat on page 204 of the PBS Profes sional Reference Guide By default qstat displays information for queued or running
75. s Guide UG 95 Chapter 5 Multiprocessor Jobs If you don t explicitly request a value for the mpiprocs resource it defaults to 7 for each chunk requesting CPUs and 0 for chunks not requesting CPUs Example 5 4 To request one chunk with two MPI processes and one chunk with one MPI process where both chunks have two CPUs lselect ncpus 2 mpiprocs 2 ncpus 2 Example 5 5 A request for three vnodes each with one MPI process qsub 1 select 3 ncpus 2 This results in the following node file lt hostname for 1st vnode gt lt hostname for 2nd vnode gt lt hostname for 3rd vnode gt Example 5 6 If you want to run two MPI processes on each of three hosts and have the MPI processes share a single processor on each host request the following lselect 3 ncpus 1 mpiprocs 2 The node file then contains the following list hostname for VnodeA hostname for VnodeA hostname for VnodeB hostname for VnodeB hostname for VnodeC hostname for VnodeC Example 5 7 If you want three chunks each with two CPUs and running two MPI processes use l select 3 ncpus 2 mpiprocs 2 The node file then contains the following list hostname for VnodeA hostname for VnodeA hostname for VnodeB hostname for VnodeB hostname for VnodeC hostname for VnodeC Notice that the node file is the same as the previous example even though the number of CPUs used is different UG 96 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 Example
76. samples nrt 5 2 5 8 Examples Using poe Example 5 19 Using IP mode run a single executable poe job with four ranks on hosts spread across the PBS allocated hosts listed in BS_NODEFILE cat PBS NODEFILE host1 host2 host3 host4 cat job script poe path mpiprog euilib ip qsub 1 select 4 ncpus 1 lplace scatter job script Example 5 20 Using US mode run a single executable poe job with four ranks on hosts spread across the PBS allocated hosts listed in 6 BS_NODEFILE cat PBS NODEFILE host1 host2 host3 host4 UG 110 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 Q cat job script poe path mpiprog euilib us qsub 1 select 4 ncpus 1 lplace scatter job script Example 5 21 Using IP mode run executables prog and prog2 with two ranks of prog on host1 two ranks of prog2 on host2 and two ranks of prog2 on host3 Q cat PBS_NODEFILE host1 host1 host2 host2 host3 host3 cat job script echo progl gt tmp poe cmd echo progl gt gt tmp poe cmd echo prog2 gt gt tmp poe cmd echo prog2 gt gt tmp poe cmd echo prog2 gt gt tmp poe cmd echo prog2 gt gt tmp poe cmd poe cmdfile tmp poe cmd euilib ip rm tmp poe cmd qsub 1 select 3 ncpus 2 mpiprocs 2 l place scatter job script PBS Professional 13 0 Beta User s Guide UG 111 Chapter 5 Multiprocessor Jobs Example 5 22 Using US mode run executables prog and prog2 with two ranks of pro
77. south PBS Professional 13 0 Beta User s Guide UG 181 Chapter 7 Reserving Resources Ahead of Time Example short output pbs_rstat S Name Queue User State Start Duration End R302 south R302 userl RN Today 12 00 7200 Today 14 00 304 south 304 userl CO May 1 2008 15 00 3600 May 1 2008 16 00 UG 182 PBS Professional 13 0 Beta User s Guide Reserving Resources Ahead of Time Chapter 7 Example full output pbs_rstat F Name R302 south Reserve Name NULL Reserve Owner userl1 south mydomain com RESV_RUNNING reserve _substate 5 reserve start Mon Apr 28 12 00 00 2008 reserve end Mon Apr 28 14 00 00 2008 reserve duration 7200 reserve state queue R302 Resource List ncpus 2 Resource List nodect 1 02 00 00 Resource List select 1 ncpus 2 Resource List walltime Resource List place free resv_nodes south ncpus 2 Authorized Users userl south mydomain com server south ctime Mon Apr 28 11 00 00 2008 Mail Users userl mydomain com mtime Mon Apr 28 11 00 00 2008 Variable List PBS _O LOGNAME userl PBS O HOST south mydomain com Name S 304 south Reserve Name NULL Reserve Owner userl south mydomain com reserve state RESV_CONFIRMED reserve _substate 2 reserve start Thu May 1 15 00 00 2008 reserve_end Thu May 1 16 00 00 2008 reserve duration 3600 queue S304 Resource List ncpus 2 Resource List nodect 1 Resource List walltime 01 00 00 PB
78. specific queues e Make sure that there are no spaces in your recurrence rule UG 178 PBS Professional 13 0 Beta User s Guide Reserving Resources Ahead of Time Chapter 7 7 3 3 3 Examples of Creating Standing Reservations For a reservation that runs every day from 8am to 10am for a total of 10 occurrences pbs_rsub R 0800 E 1000 r FREQ DAILY COUNT 10 Every weekday from 6am to 6pm until December 10 2008 pbs_rsub R 0600 E 1800 r FREQ WEEKLY BYDAY MO TU WE TH FR UNTIL 20081210 Every week from 3pm to 5pm on Monday Wednesday and Friday for 9 occurrences 1 e for three weeks pbs_rsub R 1500 E 1700 r FREQ WEEKLY BYDAY MO WE FR COUNT 9 7 3 3 4 Getting Confirmation of a Reservation By default the pbs_rsub command does not immediately notify you whether the reserva tion is confirmed or denied Instead you receive email with this information You can specify that the pbs_rsub command should wait for confirmation by using the I lt block_time gt option The pbs_ rsub command will wait up to lt block_time gt seconds for the reservation to be confirmed or denied and then notify you of the outcome If block_time is negative and the reservation is not confirmed in that time the reservation is automatically deleted To find out whether the reservation has been confirmed use the pbs_rstat command It will display the state of the reservation CO and RESV_CONFIRMED indicate that it is con firmed If the reser
79. temporary directory on each host using the tmpdir MoM parameter In this case the TMP DIR environment variable is set to the full path of the resulting temporary directory Do not attempt to set TMPDIR 5 2 4 HP MPI with PBS HP MPI can be integrated with PBS on UNIX and Linux so that PBS can track resource usage signal processes and perform accounting for all job processes Your PBS administra tor can integrate HP MPI with PBS 5 2 4 1 Setting up Your Environment for HP MPI In order to override the default rsh set PBS_RSHCOMMAND in your job script export PBS _RSHCOMMAND lt rsh choice gt 5 2 4 2 Using HP MPI with PBS You can run jobs under PBS using HP MPI without making any changes to your MPI com mand line 5 2 4 3 Options When running a PBS HP MPI job you can use the same arguments to the mpirun command as you would outside of PBS The following options are treated differently under PBS h lt host gt Ignored UG 104 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 lt user gt Ignored np lt number gt Modified to fit the available resources 5 2 4 4 Caveats for HP MPI with PBS Under the integrated HP MPI the job s working directory is changed to your home directory 5 2 5 IBM POE with PBS When you are using AIX machines running IBM s Parallel Operating Environment or POE you can run PBS jobs using either the HPS or InfiniBand whichever is available You can use
80. the job the name of the job defaults to the name of the script If you submit a job without using a script and without spec ifying a name for the job the job name is STDIN You can specify the name of a job using the following methods e Using qsub N lt job name gt e Using PBS N lt job name gt For example qsub N myName my_job PBS N myName The job name can be up to 236 characters in length and must consist of printable non whitespace characters The first character must be alphabetic numeric hyphen underscore or plus sign 2 5 3 Specifying a Job s Project In PBS a project is a way to organize jobs independently of users and groups You can use a project as a tag to group a set of jobs Each job can be a member of up to one project Projects are not tied to users or groups One user or group may run jobs in more than one project For example user Bob runs JobA in ProjectA and JobB in ProjectB User Bill runs JobC in ProjectA User Tom runs JobD in ProjectB Bob and Tom are in Group and Bill is in Group2 A job s project attribute specifies the job s project See project on page 401 of the PBS Professional Reference Guide You can set the job s project attribute in the following ways e At submission e Using qsub P lt project name gt e Via PBS project lt project name gt e After submission via qalter P lt project name gt see qalter on page 131 of the PBS Professional Reference Guide UG
81. to request the vnodes that represent the group of nodes you want These resources are named PBScraylabel_ lt label name gt and set to True on the vnodes that represent the labeled nodes Your administrator must label the groups of nodes For example if a node is both fast and best for App1 it can have two labels fast and BestForApp7 To request the fast nodes in this example add the following to each chunk request PBScraylabel_fast True e Other custom resources on each vnode which are set to reflect the vnode s characteris tics For example if a vnode is fast it can have a custom string resource called speed with a value of fast on that vnode You must ask your administrator for the name and possible values for the resource UG 266 PBS Professional 13 0 Beta User s Guide Submitting Cray Jobs Chapter 11 11 5 9 Requesting Nodes in Specific Order Your application may perform better when the ranks are laid out on specific nodes in a spe cific order If you want to request vnodes so that the nodes are in a specific order you can specify the host for each chunk of the job For example if you need nodes ordered nid0 nid2 nid4 you can request the following qsub 1 select 2 ncpus 2 host nidO 2 ncpus 2 host nid2 2 ncpus 2 host nid4 11 5 10 Requesting Interlagos Hardware PBS allows you to specifically request or avoid Interlagos hardware Your administrator must create a Boolean resource on each vnode
82. tory is automatically removed 3 2 8 7 ii User s Home Directory as Staging and Execution Directory If the k option to qsub is used standard out and or standard error files are retained on the primary execution host instead of being returned to the submission host and are not deleted after job end 3 2 8 8 Running the Epilogue PBS runs the epilogue on the primary host as root The epilogue is executed with its current working directory set to the job s staging and execution directory and with PBS_JOBDIR and TMPDIR set in its environment 3 2 8 9 Staging Files Out and Removing Execution Directory When PBS stages files out it evaluates execution_path and storage_path relative to PBS_JOBDIR Files that cannot be staged out are saved in PBS_HOME undelivered 3 2 8 9 i Job specific Staging and Execution Directories If PBS created job specific staging and execution directories for the job it cleans up at the end of the job The staging and execution directory and all of its contents are removed on all execution hosts UG 46 PBS Professional 13 0 Beta User s Guide Job Input amp Output Files Chapter 3 3 2 8 10 Removing TMPDIRs PBS removes all TMPDIRs along with their contents 3 2 9 Staging with Job Arrays File staging is supported for job arrays See File Staging for Job Arrays on page 197 3 2 10 Using xpbs for File Staging Using xpbs to set up file staging directives may be easier than using the command line
83. trusted and you do not have a rhosts file An improper path was specified A directory in the specified destination path is not writable Your cshrc on the destination host generates output when executed The path specified by PBS_SCP in pbs conf is incorrect Boo oP eM The PBS_HOME spool directory on the execution host does not have the correct per missions This directory must have mode 1777 drwxrwxrwxt on UNIX or Full Con trol for Everyone on Windows UG 54 PBS Professional 13 0 Beta User s Guide Job Input amp Output Files Chapter 3 3 3 8 Caveats for Output and Error Files 3 3 8 1 Retaining Files on Execution Host When PBS creates a job specific staging and execution directory and you use the k option to qsub or you specify o and or e in the Keep_Files attribute the files you requested kept on the execution host are deleted when the job specific staging and execution directory is deleted at the end of the job 3 3 8 2 Standard Output and Error Appended When Job is Rerun If your job runs and writes to stdout or stderr and then is rerun meaning that another job with the same name is run PBS appends the stdout of the second run to that of the first and appends the stderr of the second run to that of the first 3 3 8 3 Windows Mapped Drives and PBS In Windows when you map a drive it is mapped locally to your session The mapped drive cannot be seen by other processes outside of your session
84. with a job s placement request are described in sharing on page 380 of the PBS Professional Reference Guide The following table expands on this Table 4 3 How Vnode sharing Attribute Affects Vnode Allocation Value of Vnode s Effect on Allocation sharing Attribute not set The job s arrangement request determines how vnodes are allo cated to the job If there is no specification vnodes are shared default_share Vnodes are shared unless the job explicitly requests exclusive use of the vnodes default_excl Vnodes are allocated exclusively to the job unless the job explicitly requests shared allocation default_exclhost All vnodes from this host are allocated exclusively to the job unless the job explicitly requests shared allocation ignore_excl Vnodes are shared regardless of the job s request PBS Professional 13 0 Beta User s Guide UG 79 Chapter 4 Allocating Resources amp Placing Jobs Table 4 3 How Vnode sharing Attribute Affects Vnode Allocation Value of Vnode Effect on Allocation sharing Attribute force_excl Vnodes are allocated exclusively to the job regardless of the job s request force_exclhost All vnodes from this host are allocated exclusively to the job regardless of the job s request If a vnode is allocated exclusively to a job all of its resources are assigned to the job The state of the vnode becomes job exclusive No other job can
85. your job to run on Cray nodes you must specify a Cray node type for your job You do this by requesting a value for the vntype vnode resource On each vnode on a Cray the vntype resource includes one of the following values cray_login for a login node cray_compute for a compute node Each chunk of a Cray job that must run on a login node must request a vntype of cray_login Each chunk of a Cray job that must run on a compute node must request a vntype of cray_compute Example 11 6 Request any login node and two compute node vnodes The job is run on the login node selected by the scheduler qsub lselect 1 ncpus 2 vntype cray_login 2 ncpus 2 vntype cray_compute Example 11 7 Launch a job on a particular login node by specifying the login node vnode name first in the select line The job script runs on the specified login node qsub lselect 1 ncpus 2 vnode login1 2 ncpus 2 vntype cray_compute 6 For a description of the vntype resource see Built in Resources on page 307 of the PBS Professional Reference Guide UG 260 PBS Professional 13 0 Beta User s Guide Submitting Cray Jobs Chapter 11 11 4 2 Always Reserve Required Vnodes Always reserve at least as many PEs as you request in your aprun statement 11 4 3 Requesting Login Node Where Job Script Runs If you request a login node as part of your resource request the login node resource request must be the first element of the select statement The job script is run
86. 0 Beta User s Guide UG 133 Chapter 5 Multiprocessor Jobs 2 13 3 Examples Example 5 38 Run a single executable MVAPICH2 job with six ranks on hosts listed in PBS_NODEFILE PBS_NODEFILE pbs host1 pbs host1 pbs host2 pbs host2 pbs host3 pbs host3 Job script mpiexec np 6 path mpiprog Run job script qsub 1 select 3 ncpus 2 mpiprocs 2 job script lt job id gt Example 5 39 Launch an MVAPICH2 MPI job with multiple executables on multiple hosts listed in the default file mpd hosts Here run executables prog and prog2 with two ranks of prog on host1 two ranks of prog2 on host2 and two ranks of prog2 on host3 all specified on the command line PBS_NODEFILE pbs host1 pbs host1 pbs host2 pbs host2 pbs host3 pbs host3 Job script mpiexec n 2 progl n 2 prog2 n 2 prog2 UG 134 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 Run job script qsub 1 select 3 ncpus 2 mpiprocs 2 job script lt job id gt Example 5 40 Launch an MVAPICH2 MPI job with multiple executables on multiple hosts listed in the default file mpd hosts Run executables prog and prog2 with two ranks of prog on host two ranks of prog2 on host2 and two ranks of prog2 on host3 all spec ified using the configfile option PBS_NODEFILE pbs host1 pbs host1 pbs host2 pbs host2 pbs host3 pbs host3 Job script echo n 2 host hostl progl gt tmp jobconf echo n 2 host host2 prog2 gt gt
87. 0 ncpus 1 mem 1gb l place free Here is a small job that uses a shared cpuset l select 1 ncpus 1 mem 512kb l place pack shared Request a special resource available on a limited set of nodeboards such as a graphics card l select 1l ncpus 2 mem 2gb graphics True 1l ncpus 20 mem 20gb graphics False l place pack excl Align SMP jobs on c brick boundaries l select 1 ncpus 4 mem 6gb l place pack group cbrick Align a large job within one router if it fits within a router l select 1 ncpus 100 mem 200gb l place pack group router Fit large jobs that do not fit within a single router into as few available routers as possi ble Here RES is the resource used for node grouping l select 1 ncpus 300 mem 300gb l place pack group lt RES gt To submit an MPI job specify one chunk per MPI task For a 10 way MPI job with 2gb of memory per MPI task 1 select 10 ncpus 1 mem 2gb To submit a non MPI job including a 1 CPU job or an OpenMP or shared memory job use a single chunk For a 2 CPU job requiring 10gb of memory l select 1 ncpus 2 mem 10gb PBS Professional 13 0 Beta User s Guide UG 85 Chapter 4 Allocating Resources amp Placing Jobs 4 8 Backward Compatibility 4 8 1 Old style Resource Specifications Old versions of PBS allowed job submitters to ask for resources outside of a select statement using lLresource value where those resources must now be requested in chunks insid
88. 05 10 Checking Job amp System Status 10 1 Viewing Job Status aaua aaaea 10 2 Viewing Server SIAWS s 5 24 50 canis Gea Saas Gated eda 10 3 Checking Queue Status 000s cee ee eee ee ena 10 4 Viewing Job amp System Status with xpbs 10 5 Selecting a List of JODS 226 020030200 sreds creer sews 10 6 Tracking Job Progress Using xpbs TrackJob Feature 10 7 Checking License Availability 004 11 Submitting Cray Jobs TLE WMNOGUCION 2 ob 52 25 6 cag sineeg hea coed ade Ea 11 2 PBSJobs onthe Cray 2s ctackGhwestdesciwaawban egw 11 3 PBS Resources for the Cray 2 020 508 fb ee eee eee 11 4 Rules for Submitting Jobs on the Cray 11 5 Techniques for Submitting Cray Jobs 11 6 Viewing Cray Job Information asana saaa aaea 11 7 Caveats and Advice assssreaasussnesarnanneaa 116 Errors and Logging s oe edeacasia mate aeaauteiae tkati 12 Using Provisioning 121 Definitions reris teres sione Ries ekee opui hades bee 12 2 How Provisioning Works ssasssa aasan 12 3 Requirements and Restrictions 12 4 Using Provisioning 2 0 035 4505 544 620644 540d stds eee ed 12 5 Caveats and EmO Serep 200000 5eoeeats eee Pate wanes 13 Special Circumstances and Tools 13 1 Support for Large Page Mode on AIX 13 2 Using Comprehensive System Accounting PBS Professional 13 0 Beta User s Guide UG v
89. 1 1 Advice on Requesting Accelerators e When requesting accelerators put them in the same chunks as CPUs Otherwise the accelerators could end up in a chunk taken from a different host from the CPUs and in that case your CPUs could be on a host without an accelerator e Use accelerator True in a chunk only when you don t care how many accelerators are in the chunk 11 5 11 2 Examples of Requesting Accelerators Example 11 11 You want a total of 40 PEs with 4 PEs per compute node and one accelerator per compute node lselect 10 ncpus 4 naccelerators 1 Example 11 12 You want 30PEs and a Tesla_x2090 accelerator on each host and the accel erator should have at least 4OOOMB and you don t care how many hosts the job uses lselect 30 ncpus 1 nacclerators 1 accelerator_model Tesla_x2090 accelerator_memory 4000MB myjob Example 11 13 You want a total of 40 PEs with 4 PEs per compute node and at least one accelerator per compute node lselect 10 ncpus 4 accelerator True Example 11 14 Your system has some compute nodes with one type of accelerator GPU1 and another type of compute node with a different type of accelerator GPU2 and you want to request 10 PEs and 1 accelerator of model GPU1 per compute node and 4 PEs and 1 accelerator of model GPU2 per compute node Your job request would look like this lselect 10 ncpus 1 naccelerators 1 accelerator_model GPU1 4 ncpus 1 naccelerators 1 accelerator_model
90. 1 4 Longer Job and Reservation Names 6 You can use job and reservation names up to 236 characters in length See Formats on page 413 of the PBS Professional Reference Guide 1 1 2 New Features in PBS Professional 12 2 1 1 2 1 Setting Number of Job Run Attempts You can tell PBS how many attempts it can make to run your job up to a limit See section 6 7 Controlling Number of Times Job is Re run on page 162 PBS Professional 13 0 Beta User s Guide UG 1 Chapter 1 New Features 1 1 2 2 Interactive Jobs on Windows You can run interactive jobs under Windows See section 6 11 Running Your Job Interac tively on page 165 1 1 3 New Features in PBS Professional 12 0 1 1 3 1 Shrink to fit Jobs PBS allows you to specify a variable running time for jobs You can specify a walltime range for jobs where attempting to run the job in a tight time slot can be useful Administrators can convert non shrink to fit jobs into shrink to fit jobs in order to maximize machine use See section 6 3 Adjusting Job Running Time on page 149 1 1 4 New Features in PBS Professional 11 3 1 1 4 1 Deleting Moved and Finished Jobs You can delete a moved or finished job See section 9 3 2 Deleting Finished Jobs on page 218 and section 9 3 3 Deleting Moved Jobs on page 218 1 1 5 New Features in PBS Professional 11 2 1 1 5 1 Grouping Jobs by Project You can group your jobs by project by assigning projec
91. 5 Tracking Progress for Interactive Jobs After you have submitted an interactive job PBS prints the following message to the window where you submitted the job qsub waiting for job lt job ID gt to start When the job is started by the scheduler PBS prints the following message to the submission window qsub job lt job ID gt ready When the interactive job finishes PBS prints the following message to the submission win dow qsub job lt job ID gt completed 6 11 6 Special Sequences for Interactive Jobs Keyboard generated interrupts are passed to the job Lines entered that begin with the tilde character and contain special sequences are interpreted by qsub itself The recognized special sequences are qsub terminates execution The batch job is also terminated susp Suspends the qsub program susp is the suspend character usually CTRL Z PBS Professional 13 0 Beta User s Guide UG 167 Chapter 6 Controlling How Your Job Runs asusp Suspends the input half of qsub terminal to job but allows output to continue to be displayed asusp is the auxiliary suspend character usually control Y 6 11 7 Caveats and Restrictions for Interactive Jobs e Make sure that your login file does not run processes in the background See section 2 4 2 5 Avoid Background Processes Inside Jobs on page 14 e You cannot run an array job interactively e Interactive jobs are not rerunnable e An inte
92. 50800 E 170830 the duration of your reservation will be two days plus 30 minutes The interval_spec can be used to specify the day or the hour at which the interval starts If you specify R 0915 E 0945 BYHOUR 9 10 the duration is 30 minutes and the offset is 15 minutes from the start of the interval The interval start is at 9 and again at 10 Your reservation will run from 9 15 to 9 45 and again at 10 15 and 10 45 Similarly if you specify R 0800 E 1000 BYDAY WE TH the duration is two hours and the offset is 8 hours from the start of the interval Your reserva tion will run Wednesday from 8 to 10 and again on Thursday from 8 to 10 Elements specified in the recurrence rule override those specified in the arguments to the R and E options Therefore if you specify R 0730 E 0830 BYHOUR 9 PBS Professional 13 0 Beta User s Guide UG 177 Chapter 7 Reserving Resources Ahead of Time the duration is one hour but the hour element 9 00 in the recurrence rule has overridden the hour element specified in the argument to R 7 00 The offset is still 30 minutes after the interval start Your reservation will run from 9 30 to 10 30 Similarly if the 16th is a Mon day and you specify R 160800 E 170900 BYDAY TU BYHOUR 11 the duration 25 hours but both the day and the hour elements have been overridden Your res ervation will run on Tuesday at 11 for 25 hours ending Wednesday at 12 However if you speci
93. 8 Exit Status job arrays UG 202 F file staging UG 35 Files pbs conf UG 303 xpbsre UG 303 floating licenses UG 63 free UG 78 freq_spec UG 176 G group resource UG 78 grouping UG 78 UG 307 UG Index H here document UG 23 hfile UG 108 Hitchhiker s Guide UG 109 hostfile UG 108 l identifier UG 9 Identifier Syntax UG 193 InfiniBand UG 132 UG 133 instance UG 173 instance of a standing reservation UG 173 instances option UG 108 Intel MPI examples UG 114 interval_spec UG 176 J ja UG 288 Job comment UG 233 dependencies UG 146 sending messages to UG 219 sending signals to UG 220 submission options UG 25 tracking UG 248 job identifier UG 9 Job Array States UG 194 job array identifier UG 191 range UG 191 Job Arrays UG 191 exit status UG 202 prologues and epilogues UG 195 job dependencies xpbs UG 148 Job Submission Options UG 25 job wide resource UG 59 UG 61 UG 308 L Large Page Mode UG 287 Limits on Resource Usage UG 73 Listbox UG 290 M max_walltime UG 152 min_walltime UG 153 Modifying Job Attributes UG 215 Moving jobs between queues UG 222 MP_DEVTYPE UG 107 MP_EUIDEVICE UG 107 MP_EUILIB UG 107 MP_HOSTFILE UG 108 MP_INSTANCES UG 108 MP_ PROCS UG 108 MPI Intel MPI examples UG 114 MPICH_GM rsh ssh examples UG 122 MPICH2 examples UG 130 UG 134 MPICH GM MPD examples UG 121 MPICH MX MPD examples UG 124 rsh ssh examples UG 126
94. A Manager can place any hold on any job The usage syntax of the qhold command is the following ghold h hold list job_identifier For a job array the job_identifier must be enclosed in double quotes UG 156 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs Chapter 6 The hold_list specifies the types of holds to be placed on the job The hold list argument is a string consisting of one or more of the letters u p o or s in any combination or the letter n The following table shows the hold type associated with each letter Table 6 1 Hold Types Letter Meaning n none no hold type specified u user job owner can set and release this hold type p password set if job fails due to a bad password can be unset by job owner o operator requires Operator privilege to set or unset S system requires Manager privilege to unset If no h option is specified PBS applies a user hold to the jobs listed in the job_identifier list Ifa job in the job_identifier list is in the queued held or waiting states the only change is that the hold type is added to the job s other holds If the job is queued or waiting in an execution queue the job is also put in the held state 6 5 2 Requirements for Holding or Releasing a Job The user executing the qhold or qrls command must have the necessary privilege to apply a hold or release a hold The same rules apply for releasing a hold and for f
95. A drive mapped on one session cannot be un mapped in another session even if the user is the same This has implications for running jobs under PBS Specifically if you map a drive chdir to it and submit a job from that location the vnode that executes the job may not be able to deliver the files back to the same location from which you issued qsub The workaround is to tell PBS to deliver the files to a local non mapped directory Use the o or e options to qsub to specify the direc tory location for the job output and error files For details see section 3 3 2 Paths for Output and Error Files on page 50 3 3 8 4 Harmless csh Error Message If your login shell is csh the following message may appear in the standard output of a job Warning no access to tty thus no job control in this shell This message is produced by many csh versions when the shell determines that its input is not a terminal Short of modifying csh there is no way to eliminate the message Fortu nately it is just an informative message and has no effect on the job 3 3 8 5 Interactive Jobs and File I O When an interactive job finishes stdout and or stderr may not have been copied back yet PBS Professional 13 0 Beta User s Guide UG 55 Chapter 3 Job Input amp Output Files 3 3 8 6 Write Permissions Required e You must have write permission for any directory where you will copy stdout or stderr e Root must be able to write in PBS_
96. Altair PBS Professional 13 0 Beta User s Guide You are reading the Altair PBS Professional 13 0 Beta User s Guide UG Updated 12 2 14 Copyright 2003 2014 Altair Engineering Inc All rights reserved PBS PBS Works PBS GridWorks PBS Professional PBS Analytics PBS Catalyst e Compute and e Render are trademarks of Altair Engineering Inc and are protected under U S and inter national laws and treaties All other marks are the property of their respective owners ALTAIR ENGINEERING INC Proprietary and Confidential Contains Trade Secret Informa tion Not for use or disclosure outside ALTAIR and its licensed clients Information contained herein shall not be decompiled disassembled duplicated or disclosed in whole or in part for any purpose Usage of the software is only as explicitly permitted in the end user software license agreement Copyright notice does not imply publication Documentation and Contact Information Ccontact Altair at www pbsworks com pbssales altair com Technical Support Location Telephone e mail North America 248 614 2425 pbssupport altair com China 86 0 21 6117 1666 es altair com cn France 33 0 1 4133 0992 francesupport altair com Germany 49 0 7031 6208 22 hwsupport altair de India 91 80 66 29 4500 pbs support india altair com Italy 39 800 905595 support altairengineering it Japan 81 3 5396 2881 pb
97. B qsub l ncpus 1 lmem 4gb In QA select 1 ncpus 1 mem 4gb No defaults need be applied In QB select 1 ncpus 1 mem 4gb No defaults need be applied qsub 1 ncpus 1 In QA select 1 ncpus 1 mem 2gb Picks up 2gb from queue default chunk and ncpus from qsub In QB select 1 ncpus 1 mem 1gb Picks up 1gb from queue default chunk and nepus from qsub qsub lmem 4gb In QA select 1 ncpus 2 mem 4gb Picks up 2 ncpus from queue level job wide resource default and 4gb mem from qsub In QB select 1 ncpus 1 mem 4gb Picks up 1 ncpus from server level job wide default and 4gb mem from qsub qsub lnodes 4 In QA select 4 ncpus 1 mem 2gb Picks up a queue level default memory chunk of 2gb This is not 4 ncpus 2 because in prior versions nodes x implied CPU per node unless otherwise explicitly stated In QB select 4 ncpus 1 mem 1gb UG 72 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 In prior versions nodes x implied 1 CPU per node unless otherwise explic itly stated so the ncpus 1 is not inherited from the server default qsub 1 mem l16gb 1nodes 4 In QA select 4 ncpus 1 mem 4gb This is not 4 ncpus 2 because in prior versions nodes x implied 1 CPU per node unless otherwise explicitly stated In QB select 4 ncpus 1 mem 4gb In prior versions nodes x implied 1 CPU per node unless otherwise explic itly stated so the ncpus 1 is not inherited from the server
98. BS hosts are specified in the process group file MPI processes spawned on non PBS hosts are not guaranteed to be under the control of PBS 5 2 10 1 ii MPD Startup and Shutdown The PBS mpirun interface starts MPD daemons on each of the unique hosts listed in PBS_NODEFILE using either the rsh or ssh method based on value of environment variable RSHCOMMAND The default is rsh The interface also takes care of shutting down the MPD daemons at the end of a run If the MPD daemons are not running the PBS interface to mpirun starts MX s MPD dae mons as you on the allocated PBS hosts The MPD daemons may already have been started by the administrator or by you MPD daemons are not started inside a PBS prologue script since it won t have the path of mpirun that you executed GM or MX which would deter mine the path to the MPD binary 5 2 10 1 iii Examples Example 5 30 Run a single executable MPICH MX job with 64 processes spread out across the PBS allocated hosts listed in BS_NODEFILE PBS_NODEFILE pbs host1 pbs host2 pbs host64 UG 124 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 qsub 1 select 64 ncpus 1 lplace scatter MPICH MX HOME bin mpirun np 64 path myprog x 1200 D lt job id gt If the MPD daemons are not running the PBS interface to mpirun starts MX s MPD daemons as you on the allocated PBS hosts The MPD daemons may be already started by the administrator or by you Ex
99. E wc 1 ANY C MPI CODE HERE date When using the integrated lamboot ina job script lamboot takes input from PBS_NODEFILE automatically so the argument is not necessary 5 2 7 4 See Also For information on LAM MPI see www lam mpi org 5 2 8 MPICH P4 with PBS MPICH P4 can be integrated with PBS on UNIX and Linux so that PBS can track resource usage signal processes and perform accounting for all job processes Your PBS administra tor can integrate MPICH P4 with PBS UG 118 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 5 2 8 1 Options for MPICH P4 with PBS Under PBS the syntax and arguments for the MPICH P4 mpirun command on Linux are the same except for one option which you should not set machinefile file PBS supplies the machinefile If you try to specify it PBS prints a warning that it is replacing the machinefile 5 2 8 2 Example of Using MPICH P4 with PBS Example of using mpirun PBS 1 select arch linux mpirun a out 5 2 8 3 MPICH Under Windows Under Windows you may need to use the Localroot option to MPICH s mpirun com mand in order to allow the job s processes to run more efficiently or to get around the error failed to communicate with the barrier command Here is an example job script C DOCUME 1 userl gt type job scr echo begin type PBS NODEFILE Program Files MPICH mpd bin mpirun localroot np 2 machinefile PBS NODEFILE winnt temp netpip
100. EFILE np If not specified the number of entries found in PBS_NODEFILE is used The maximum number of ranks that can be launched is the number of entries in PBS_NODEFILE Pg The use of the pg option for having multiple executables on multiple hosts is allowed but it is up to you to make sure only PBS hosts are specified in the process group file MPI processes spawned on non PBS hosts are not guaranteed to be under the control of PBS 5 2 9 2 ii Examples Example 5 28 Run a single executable MPICH GM job with 64 processes spread out across the PBS allocated hosts listed in 6 BS_NODEFILE PBS_NODEFILE pbs host1 pbs host2 pbs host64 UG 122 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 qsub 1 select 64 ncpus 1 l place scatter mpirun np 64 path myprog x 1200 D lt job id gt Example 5 29 Run an MPICH GM job with multiple executables on multiple hosts listed in the process group file procgrp qsub 1 select 2 ncpus 1 echo hostl 1 userl x y a exe argl arg2 gt procgrp echo host2 1 userl x x b exe argl arg2 gt gt procgrp mpirun pg procgrp path mypro x rm f procgrp D lt job id gt When the job runs mpirun gives this warning message warning pg is allowed but it is up to user to make sure only PBS hosts are specified MPI processes spawned are not guaranteed to be under the control of PBS The warning is issued because if any of the hosts listed in pr
101. GPU2 myjob 11 6 Viewing Cray Job Information 11 6 1 Finding Out Where Job Was Launched To determine the internal login node where the job was launched use the qstat f com mand qstat f lt job ID gt UG 268 PBS Professional 13 0 Beta User s Guide Submitting Cray Jobs Chapter 11 Look at the exec_host line of the output The first vnode is the login node where the job was launched 11 6 2 Finding Out How mpp Request Was Translated e To find out how the mpp job request was translated into select and place statements use the qstat f command qstat f x lt job ID gt Look at the Resource_List select job attribute The original is in the Submit_arguments job attribute e To find out how the mpp reservation request was translated into select and place state ments use the pbs_rstat command pbs_rstat F lt reservation ID gt Look at the Resource_List attribute 11 6 3 Viewing Original mpp Request To see the original mpp request use the qstat command qstat f x lt job ID gt The Submit_arguments field contains the original mpp request 11 6 4 Listing Jobs Running on Vnode To see which jobs are running on a vnode use the pbsnodes command pbsnodes av The jobs attribute of each vnode lists the jobs running on that vnode Jobs launched from an internal login node requesting a vntype of cray_compute only are not listed in the internal login node s vnode s jobs attribute Jobs that are ac
102. HOME spool UG 56 PBS Professional 13 0 Beta User s Guide 4 Allocating Resources amp Placing Jobs 4 1 What is a Vnode A virtual node or vnode is an abstract object representing a set of resources which form a usable part of a machine This could be an entire host or a nodeboard or a blade A single host can be made up of multiple vnodes A host is any computer Execution hosts used to be called nodes and are still often called nodes outside of the PBS documentation PBS views hosts as being composed of one or more vnodes PBS manages and schedules each vnode independently Jobs run on one or more vnodes Each vnode has its own set of attributes see Vnode Attributes on page 375 of the PBS Pro fessional Reference Guide 4 1 1 Deprecated Vnode Types All vnodes are treated alike and are treated the same as what were once called time shared nodes The types time shared and cluster are deprecated The ts suffix is deprecated It is silently ignored and not preserved during rewrite The vnode attribute ntype was only used to distinguish between PBS and Globus vnodes Globus can still send jobs to PBS but PBS no longer supports sending jobs to Globus The ntype attribute is read only PBS Professional 13 0 Beta User s Guide UG 57 Chapter 4 Allocating Resources amp Placing Jobs 4 2 PBS Resources 4 2 1 Introduction to PBS Resources In this section Introduction to PBS Resources we wil
103. Interpreter on page 19 2 2 5 1 i Comparison Between Equivalent UNIX Linux and Windows Job Scripts The following UNIX Linux and Windows job scripts produce the same results UNIX Linux bin sh PBS 1 walltime 1 00 00 PBS 1 select mem 400mb PBS j oe date my_application date Windows REM PBS 1 walltime 1 00 00 REM PBS 1 select mem 400mb REM PBS j oe date t my_ application date t UG 16 PBS Professional 13 0 Beta User s Guide Submitting a PBS Job Chapter 2 The first line in the Windows script does not contain a path to a shell because you cannot specify the path to the shell or interpreter inside a Windows job script See section 2 3 3 2 c Specifying Job Script Shell or Interpreter on page 19 The remaining lines of both files are almost identical The primary differences are in file and directory path specifications such as the use of drive letters and slash vs backslash as the path separator The lines beginning with PBS and REM PBS are PBS directives PBS reads down the job script until it finds the first line that is not a valid PBS directive then stops From there on the lines in the script are read by the job shell or interpreter In this case PBS sees lines 6 8 as commands to be run by the job shell In our examples above the 1 lt resource gt lt value gt lines request specific resources Here we request hour of wall clock time as a job wide requ
104. Jobs Chapter 5 The following figure illustrates how the pbs_tmrsh command can be used to integrate an MPI on the fly Figure 5 1 PBS knows about processes on vnodes 2 and 3 because pbs_tmrsh talks directly to pbs_mom and pbs_mom starts the processes on vnodes 2 and 3 Session tracked by pbs_mom Job script PBS Iselect 3 ncpus 2 mpiprocs 2 mpirun using pbs_tmrsh hostfile PBS_NODEFILE a ou 5 2 1 2 i Caveats for the pbs_tmrsh Command This command cannot be used outside of a PBS job if used outside a PBS job this com mand will fail The pbs_tmrsh command does not perform exactly like rsh For example you can not pipe output from pbs_tmrsh this will fail PBS Professional 13 0 Beta User s Guide UG 103 Chapter 5 Multiprocessor Jobs 5 2 2 Prerequisites to Using MPI with PBS The MPI that you intend to use with PBS must be working before you try to use it with PBS You must be able to run an MPI job outside of PBS 5 2 3 Caveats for Using MPIs Some applications write scratch files to a temporary location PBS makes a temporary direc tory available for this and puts the path in the TMPDIR environment variable The location of the temporary directory is host dependent If you are using an MPI other than LAM MPI or Open MPI and your application needs scratch space the temporary directory for the job should be consistent across execution hosts Your PBS administrator can specify a root for the
105. PBS z There is no associated job attribute for this option 2 5 9 Running qsub in the Foreground Normally qsub runs in the background You can run it in the foreground by using the f option By default qsub attempts to communicate with a background qsub daemon that may have been instantiated from an earlier invocation This background daemon can be hold ing onto an authenticated server connection speeding up performance This option can be helpful when you are submitting a very short job which submits another job or when you are running codes written in house for Windows UG 34 PBS Professional 13 0 Beta User s Guide 3 Job Input amp Output Files 3 1 Introduction to Job File I O in PBS PBS allows you to manage input files output files standard output and standard error PBS has two mechanisms for handling job files you use staging for input and output files and you select whether stdout and or stderr are copied back using the Keep_Files job attribute 3 2 Input Output File Staging File staging is a way to specify which input files should be copied onto the execution host before the job starts and which output files should be copied off the execution host when it finishes 3 2 1 Staging and Execution Directory User s Home vs Job specific The job s staging and execution directory is the directory to which files are copied before the job runs and from which output files are copied after the job has finished Th
106. Q0 To alter a job attribute via xpbs first select the job s of interest and the click on modify but ton Doing so will bring up the Modify Job Attributes dialog box From this window you may set the new values for any attribute you are permitted to change Then click on the confirm modify button at the lower left of the window The qalter command can be used on job arrays but not on subjobs or ranges of subjobs When used with job arrays any job array identifiers must be enclosed in double quotes e g qalter l walltime 25 00 1234 south You cannot use the qalter command or any other command to alter a custom resource which has been created to be invisible or unrequestable See section 4 3 8 Caveats and Restrictions on Requesting Resources on page 67 For more information see qalter on page 131 of the PBS Professional Reference Guide 9 2 2 1 Caveats Be careful when using a Boolean resource as a job wide limit PBS Professional 13 0 Beta User s Guide UG 217 Chapter 9 Working with PBS Jobs 9 3 Deleting Jobs PBS provides the qdel command for deleting jobs The qde1 command deletes jobs in the order in which their job identifiers are presented to the command A batch job may be deleted by its owner a PBS operator or a PBS administrator Unless you are an administrator or an operator you can delete only your own jobs To delete a queued held running or suspended job qdel lt job ID gt
107. S Professional 13 0 Beta User s Guide UG 183 Chapter 7 Reserving Resources Ahead of Time Resource List select 1 ncpus 2 Resource List place free resv_nodes south ncpus 2 reserve _rrule FREQ WEEKLY BYDAY MO COUNT 5 5 reserve index 2 reserve_count Authorized Users userl south mydomain com server south ctime Mon Apr 28 11 01 00 2008 Mail Users userl mydomain com mtime Mon Apr 28 11 01 00 2008 Variable List PBS _O LOGNAME userl PBS O HOST south mydomain com PBS TZID America Los_Angeles 7 5 Using Your Reservation 7 5 1 Submitting a Job to a Reservation Jobs can be submitted to the queue associated with a reservation or they can be moved from another queue into the reservation queue You submit a job to a reservation by using the q lt queue gt option to the qsub command to specify the reservation queue For example to sub mit a job to the soonest occurrence of a standing reservation named S123 south submit to its queue S123 qsub q S123 lt script gt You move a job into a reservation queue by using the qmove command For more informa tion see qsub on page 219 of the PBS Professional Reference Guide and qmove on page 180 of the PBS Professional Reference Guide For example to qmove job 22 myhost from workq to S123 the queue for the reservation named S123 south qmove S123 22 myhost or qmove S123 22 UG 184 PBS Professional 13 0 Beta User s Guide Reserving Res
108. S interface upon encounteringa s option describing the supported form np If you do not specify a np option then no default value is provided by the PBS interface It is up to the standard mpirun to decide what the reasonable default value should be which is usually 1 The maximum number of ranks that can be launched is the number of entries in PBS_NODEFILE 5 2 6 3 MPD Startup and Shutdown Intel MPI s mpirun takes care of starting and stopping the MPD daemons The PBS inter face to Intel MPI s mpirun always passes the arguments totalnum lt number of mpds to start gt and file lt mpd_hosts_file gt to the actual mpirun taking its input from unique entries in PBS_NODEFILE 5 2 6 4 Examples Example 5 23 Runa single executable Intel MPI job with six processes spread out across the PBS allocated hosts listed in 6 BS_NODEFILE Node file pbs host1 pbs host1 pbs host2 pbs host2 pbs host3 pbs host3 UG 114 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 Job script mpirun takes care of starting the MPD daemons on unique hosts listed in SPBS NODEFILE and also runs the 6 processes on the 6 hosts listed in SPBS NODEFILE mpirun takes care of shutting down MPDs mpirun path myprog x 1200 Run job script qsub 1 select 3 ncpus 2 mpiprocs 2 job script lt job id gt Example 5 24 Run an Intel MPI job with multiple executables on multiple hosts using PBS_NODEFILE and mpiex
109. SIMD operations for example parameter sweep applications rendering in media and entertainment EDA simulations and forex historical data 8 2 Glossary Job array identifier The identifier returned upon success when submitting a job array Job array range A set of subjobs within a job array When specifying a range indices used must be valid members of the job array s indices Sequence number The numeric part of a job or job array identifier e g 1234 Subjob Individual entity within a job array e g 1234 7 where 1234 is the job array itself and 7 is the index which has many properties of a job as well as additional semantics defined below Subjob index The unique index which differentiates one subjob from another This must be a non negative integer PBS Professional 13 0 Beta User s Guide UG 191 Chapter 8 Job Arrays 8 3 Description of Job Arrays A job array is a compact representation of two or more jobs A job that is part of a job array is called a subjob Each subjob in a job array is treated exactly like a normal job except for any differences noted in this chapter 8 3 1 Job Script for Job Arrays All subjobs in a job array share a single job script including the PBS directives and the shell script portion The job script is run once for each subjob The job script may invoke different commands based on the subjob index The commands of course may be scripts themselves if the script is setup
110. The name execution_path is the name of the file in the job s staging and execution directory on the execution host The execution_path can be relative to the job s staging and execution directory or it can be an absolute path The character separates the execution specification from the storage specification The name storage_path is the file name on the host specified by hostname For stagein this is the location where the input files come from For stageout this is where the output files end up when the job is done You must specify a hostname The name can be absolute or it can be relative to your home directory on the machine named hostname For stagein the direction of travel is from storage_path to execution_path For stageout the direction of travel is from execution_path to storage_path The following example shows how to use a directive to stagein a file named grid dat located in the directory u user1 on the host called serverA The staged in file is copied to the staging and execution directory and given the name dat1 Since execution_path is eval uated relative to the staging and execution directory it is not necessary to specify a full path name for dat1 PBS W stagein datl serverA u userl grid dat To use the qsub option to stage in the file residing on myhost in Users myhome mydata datal calling it input_datal in the staging and execution directory qsub W stagein input_datal myhost Users myhome m
111. The purpose of marking a job as non rerunnable is to prevent it from starting more than once If a job that is marked non rerunnable has an error during startup before it begins execution that job is requeued for another attempt The Rerunable job attribute controls whether the job is rerunnable you can set it via qsub or a PBS directive qsub r n my job PBS r n The following table lists the circumstances where the job s Rerunable attribute makes a dif ference or does not Table 6 2 When does Rerunable Attribute Matter Circumstance Rerunnable Not Rerunnable Job fails upon startup before running Job is requeued Job is requeued Job is running on multiple hosts and Job is requeued Job is deleted one host goes down Job is scheduled to run on multiple Job is requeued Job is requeued hosts and did not start on at least one host Server is shut down with a delay Job is requeued Job finishes Server is shut down immediately Job is requeued Job is deleted Job requests provisioning and provi Job is requeued Job is requeued sioning script fails PBS Professional 13 0 Beta User s Guide UG 161 Chapter 6 Controlling How Your Job Runs Table 6 2 When does Rerunable Attribute Matter Circumstance Rerunnable Not Rerunnable Job is running on multiple hosts and Job is requeued Job is deleted one host becomes busy due to console activity Higher priority job needs
112. UG 160 UG 217 UG 222 UG 227 UG 228 UG 229 UG 231 UG 232 UG 233 UG 234 UG 242 UG 243 UG 244 UG 245 UG 298 qstop UG 299 qsub UG 298 qsub options UG 25 qterm UG 298 UG 309 UG Index R recurrence rule UG 176 report UG 288 requesting provisioning UG 283 Reservation deleting UG 179 reservation advance UG 173 UG 175 degraded UG 174 instance UG 173 Setting start time amp duration UG 177 soonest occurrence UG 173 standing UG 173 instance UG 173 soonest occurrence UG 173 standing reservation UG 176 Submitting jobs UG 184 reservations time for provisioning UG 284 resource job wide UG 59 UG 61 Resource_List UG 26 restrictions AOE UG 281 provisioning hosts UG 281 resv_nodes UG 173 run_count UG 27 UG 162 S scatter UG 78 sequence number UG 191 setting job attributes UG 14 share UG 78 sharing UG 78 SIGKILL UG 220 SIGNULL UG 220 SIGTERM UG 220 single signon UG 23 Single Signon Password Method UG 23 soonest occurrence UG 173 stageout UG 27 UG 310 standing reservation UG 173 UG 176 States job array UG 194 states UG 304 subjob UG 191 subjob index UG 191 submission options UG 25 Submitting a PBS Job UG 7 syntax identifier UG 193 T TCL UG 289 time between reservations UG 189 TK UG 289 tracking UG 248 U until spec UG 177 user job accounting UG 288 V Vnode Types UG 57 vnodes provisioning UG 280 vntype UG 253 vscatter UG
113. Unified Job Submission PBS allows you to submit jobs using the same scripts whether the job is submitted on a Win dows or UNIX Linux system See section 2 2 2 2 Python Job Scripts on page 11 1 1 9 New Features in PBS Professional 10 2 1 1 9 1 Provisioning PBS provides automatic provisioning of an OS or application on vnodes that are configured to be provisioned When a job requires an OS that is available but not running or an application that is not installed PBS provisions the vnode with that OS or application See Chapter 12 T Using Provisioning on page 279 PBS Professional 13 0 Beta User s Guide UG 3 Chapter 1 New Features 1 1 9 2 Walltime as Checkpoint Interval Measure PBS allows a job to be checkpointed according to its walltime usage See Job Attributes on page 384 of the PBS Professional Reference Guide 1 1 9 3 Employing User Space Mode on IBM InfiniBand Switches PBS allows users submitting POE jobs to use InfiniBand switches in User Space mode See section 5 2 5 IBM POE with PBS on page 105 1 1 10 New Features in Version 10 1 1 1 10 1 Using Job History Information PBS Professional can provide job history information including what the submission parame ters were whether the job started execution whether execution succeeded whether staging out of results succeeded and which resources were used PBS can keep job history for jobs which have finished execution were deleted
114. User s Guide UG 303 Chapter 14 Using the xpbs GUI activeColor The color applied to the background of a selection a selected command button or a selected scroll bar handle disabledColor Color applied to a disabled widget signalColor Color applied to buttons that signal something to you about a change of state For example the color of the Track Job button when returned output files are detected shadingColor A color shading applied to some of the frames to emphasize focus as well as decora tion selectorColor The color applied to the selector box of a radiobutton or checkbutton selectHosts List of hosts space separated to automatically select highlight in the HOSTS list box selectQueues List of queues space separated to automatically select highlight in the QUEUES listbox selectJobs List of jobs space separated to automatically select highlight in the JOBS listbox selectOwners List of owners checked when limiting the jobs appearing on the Jobs listbox in the main xpbs window Specify value as Owners lt list_of_owners gt See u option in qselect on page 192 of the PBS Professional Reference Guide for format of lt list_of_owners gt selectStates List of job states to look for do not space separate when limiting the jobs appearing on the Jobs listbox in the main xpbs window Specify value as Job_ States lt states_string gt See s option in qselect on page 192 of the PBS Professiona
115. Using xpbs to Hold or Release Jobs To hold or release a job using xpbs first select the job s of interest then click the hold or release button 6 5 9 Examples of Holding and Releasing Jobs The following examples illustrate how to use both the qhold and qr1s commands Notice that the state S column shows how the state of the job changes with the use of these two commands qstat a 54 Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 54 south barry workq engine 1 0 20Q qhold 54 qstat a 54 Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 54 south barry workq engine 1 0 20H qrls h u 54 qstat a 54 Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 54 south barry workq engine 1 0 20 Q0 UG 160 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs Chapter 6 6 6 Allowing Your Job to be Re run You can specify whether or not your job is eligible to be re run if for some reason the job is terminated before it finishes Use the r option to qsub to specify whether the job is rerunnable The argument to this option is y meaning that the job can be re run or n meaning that it cannot If you do not specify whether or not your job is rerunnable it is rerun nable If your running your job more than once would cause a problem mark your job as non rerun nable Otherwise leave it as rerunnable
116. We recommend that this resource is a host level Boolean defined on each host on the HPS check with your adminis trator 5 2 5 3 Specifying Number of Ranks Make sure that you request the number of MPI ranks that you want since PBS calculates the number of windows based on the number of ranks You can use the mpiprocs resource to specify the number of MPI processes for each chunk See section 5 1 3 Specifying Number of MPI Processes Per Chunk on page 95 Example 5 15 To request two vnodes each with eight CPUs and one MPI rank for a total of 16 CPUs and two ranks select 2 ncpus 8 Example 5 16 To request two vnodes each with eight CPUs and eight MPI ranks for a total of 16 CPUs and 16 ranks select 2 ncpus 8 mpiprocs 8 2 5 3 i If Your Complex Contains HPS and Non HPS Machines If your complex contains machines on the HPS and machines that are not on the HPS and you wish to run on the HPS you must specify machines on the HPS To specify machines on the HPS you must request the HPS resource in your select statement This resource is configured by your PBS administrator We recommend that this resource is a host level Boolean but it could be an integer check with your PBS administrator Example 5 17 Request four chunks using place scatter The HPS resource is a Bool ean called hps Each host must have hps True qsub 1 select 4 ncpus 2 hps true lplace scatter Example 5 18 Same placement as previous exam
117. When you use the PBS sup plied mpirun PBS can track all MVAPICH processes perform accounting and have com plete job control Your PBS administrator can integrate MVAPICH with PBS so that you can use the PBS supplied mpirun in place of the MVAPICH mpirun in your job scripts PBS Professional 13 0 Beta User s Guide UG 131 Chapter 5 Multiprocessor Jobs MVAPICH allows your jobs to use InfiniBand 5 2 12 1 Interface to MVAPICH mpirun Command If executed outside of a PBS job the PBS supplied interface to mpirun behaves exactly as if standard MVAPICH mpirun had been used If executed inside a PBS job script all of the options to the PBS interface are the same as MVAPICH s mpirun except for the following map The map option is ignored machinefile lt file gt The machinefile option is ignored exclude The exclude option is ignored np If you do not specify a np option then PBS uses the number of entries found in PBS_NODEFILE The maximum number of ranks that can be launched is the number of entries in PBS_NODEFILE 5 2 12 2 Examples Example 5 37 Run a single executable MVAPICH job with six ranks spread out across the PBS allocated hosts listed in 6 BS_NODEFILE PBS_NODEFILE pbs host1 pbs host1 pbs host2 pbs host2 pbs host3 pbs host3 Contents of job script mpirun runs 6 processes mapped one to each line in PBS NODEFILE mpirun np 6 path myprog UG 132 PBS Professional 13 0 Beta User s Gu
118. _THREADS 1 NCPUS 1 OMP_NUM_THREADS 1 NCPUS 1 Example 5 54 To run two threads on each of N chunks each running a process all on the same Altix qsub 1 select N ncpus 2 1 place pack This starts N processes on a single host with two OpenMP threads per process because OMP_NUM_THREADS 2 UG 144 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs 6 1 Using Job Exit Status PBS can use the exit status of your job as input to the epilogue and to determine whether to run a dependent job If you are running under UNIX Linux make sure that your job s exit status is captured correctly see section 2 4 2 4 Capture Correct Job Exit Status on page 14 Job exit codes are listed in section 11 10 Job Exit Codes on page 991 of the PBS Profes sional Administrator s Guide The exit status of a job array is determined by the status of each of its completed subjobs and is only available when all valid subjobs have completed The individual exit status of a com pleted subjob is passed to the epilogue and is available in the E accounting log record of that subjob See Job Array Exit Status on page 202 6 1 1 Caveats for Exit Status Normally qsub exits with the exit status for a blocking job but if you submit a job that is both blocking and interactive PBS does not return the job s exit status See section 6 8 Making gsub Wait Until Job Ends on page 163 For a blockin
119. a set of built in resources and your PBS administrator can define resources You can see a list of f all built in PBS resources in Resources on page 305 of the PBS Professional Reference Guide Resources are case insensitive 2 2 3 3 Setting Job Attributes You can set job attributes and request resources using the following equivalent methods e Using specific options to the qsub command at the command line for example e lt path gt to set the error path e Using PBS directives in the job script for example PBS Error _Path lt path gt to set the error path These methods have the same functionality If you give conflicting options to qsub the last option specified overrides any others Options to the qsub command override PBS direc tives which override defaults Some job attributes and resources have default values your administrator can set default values for some attributes and resources After the job is submitted you can use the qalter command to change the job s characteris tics 2 2 3 4 Using PBS Directives A directive has the directive prefix as the first non whitespace characters The default for the prefix is PBS Put all your PBS directives at the top of the script file above any commands Any directive after an executable line in the script is ignored For example if your script contains echo put that line below all PBS directives UG 14 PBS Professional 13 0 Beta User s Guide
120. a value for some resource Your job can request exclusive use of each vnode or shared use with other jobs Your job can request exclusive use of its hosts We will cover the basics of specifying job placement in the following sections For details on placing chunks for an MPI job see Submitting Multiprocessor Jobs 4 7 1 Using the place Statement You use the place statement to specify how the job s chunks are placed The place statement has this form l place arrangement sharing grouping PBS Professional 13 0 Beta User s Guide UG 77 Chapter 4 Allocating Resources amp Placing Jobs where arrangement is one of free pack scatter vscatter sharing is one of excl shared exclhost grouping can have only one instance of group lt resource gt and where Table 4 2 Placement Modifiers Modifier Meaning free Place job on any vnode s pack All chunks will be taken from one host scatter Only one chunk is taken from any host vscatter Only one chunk is taken from any vnode Each chunk must fit on a vnode excl Only this job uses the vnodes chosen exclhost The entire host is allocated to this job shared This job can share the vnodes chosen group lt resource gt Chunks will be placed on vnodes according to a resource shared by those vnodes This resource must be a string or string array All vnodes in the group must have a common value for the resource The place stateme
121. ame STDIN Job Owner userl host2 resources _used cpupercent 0 resources _used cput 00 00 00 resources _used mem 2408kb resources _used ncpus 1 resources _used vmem 12392kb resources _used walltime 00 01 31 job_state R queue workq server host1 Checkpoint u ctime Thu Apr 2 12 07 05 2010 Error Path host2 home user1 STDIN e13 exec_host host2 0 exec_vnode host3 ncpus 1 Hold Types n Join Path n Keep Files n Mail Points a mtime Thu Apr 2 12 07 07 2010 Output_Path host2 home user1 STDIN 013 Priority 0 qtime Thu Apr 2 12 07 05 2010 Rerunable True Resource List ncpus 1 Resource List nodect 1 Resource List place free Resource List select host host3 stime Thu Apr 2 12 07 08 2010 session id 32704 jobdir home userl UG 230 PBS Professional 13 0 Beta User s Guide Checking Job amp System Status Chapter 10 substate 42 Variable List PBS_O HOME home userl PBS O LANG en_US UTF 8 PBS_O LOGNAME user1 PBS_O PATH opt gnome sbin root bin usr local bin usr bin usr X11R 6 bin bin usr games opt gnome bin opt kde3 bin usr lib mit bin us r lib mit sbin PBS_O MAIL var mail root PBS_O_SHELL bin bash PBS_O HOST host2 PBS_O WORKDIR home userl PBS_O SYSTEM Linux PBS_O QUEUE workq comment Job run at Thu Apr 02 at 12 07 on host3 ncpus 1 alt_id lt dom0 JobID xmlns dom0 http schemas microsoft com HPCS2008 hpcb p gt 149 lt dom0 JobID gt eti
122. ample 5 31 Run an MPICH MxX job with multiple executables on multiple hosts listed in the process group file procgrp qsub 1 select 2 ncpus 1 echo pbs host1 1 username x y a exe argl arg2 gt procgrp echo pbs host2 1 username x x b exe argl arg2 gt gt procgrp MPICH MX HOME bin mpirun pg procgrp path myprog x 1200 rm f procgrp D lt job id gt mpirun prints a warning message warning pg is allowed but it is up to user to make sure only PBS hosts are specified MPI processes spawned are not guaranteed to be under PBS control The warning is issued because if any of the hosts listed in procgrp are not under the control of PBS then the processes on those hosts will not be under the control of PBS 5 2 10 2 Using MPICH MX and rsh ssh with PBS PBS provides an interface to MPICH MX s mpirun using rsh ssh If executed inside a PBS job this allows for PBS to track all MPICH MxX processes started by rsh ssh so that PBS can perform accounting and has complete job control If executed outside of a PBS job it behaves exactly as if standard mpirun had been used You use the same mpirun command as you would use outside of PBS PBS Professional 13 0 Beta User s Guide UG 125 Chapter 5 Multiprocessor Jobs 5 2 10 2 i Options Inside a PBS job script all of the options to the PBS interface are the same as standard mpirun except for the following machinefile lt file gt The file argument contents are ignored and
123. an page on ja for details The starting and ending ja commands must be used before and after any other commands you wish to monitor Here are examples of command line and a script On the command line qsub N myjobname 1 ncpus 1 ja myrawfile sleep 50 ja c gt myreport ja t myrawfile ctrl D Accounting data for your job sleep 50 is written to myreport If you create a file foo with these commands PBS N myjobname PBS 1 ncpus 1 ja myrawfile sleep 50 ja c gt myreport ja t myrawfile Then you could run this script via qsub qsub foo This does the same thing via the script foo UG 288 PBS Professional 13 0 Beta User s Guide 14 Using the xpbs GUI The PBS graphical user interface is called xpbs and provides a user friendly point and click interface to the PBS commands xpbs utilizes the tcl tk graphics tool suite while providing you with most of the same functionality as the PBS CLI commands In this chapter we intro duce xpbs and show how to create a PBS job using xpbs 14 1 Using the xpbs command 14 1 1 Starting xpbs If PBS is installed on your local workstation or if you are running under Windows you can launch xpbs by double clicking on the xpbs icon on the desktop You can also start xpbs from the command line with the following command UNIX xpbs amp Windows xpbs exe Doing so will bring up the main xpbs window as shown below 14 1 2 Running xpbs Under UNIX Before running xpbs
124. and set it to True where the vnode has Interla gos hardware We recommend that the Boolean is called PBScraylabel_interlagos You request or avoid this resource using PBScraylabel_interlagos True or PBScraylabel_ interlagos False For example qsub lselect 3 ncpus 2 PBScraylabel_interlagos true myjob 11 5 11 Requesting Accelerators Accelerators are associated with vnodes when those vnodes represent NUMA nodes on a host that has at least one accelerator in state UP PBS allows you to request vnodes with associ ated accelerators PBS sets the value of the naccelerators host level resource to the number of accelerators on the host Note that this value is set for all vnodes on that host so if you have a host with one accelerator and four vnodes each of the four vnodes has naccelerators set to 1 To request accelerators for your job use the integer naccelerators resource to request a spe cific number of accelerators or the Boolean accelerator resource if you do not care how many accelerators you get To request a vnode on a host with a specific number of associated accelerators include the following in the job s select statement naccelerators lt number of accelerators gt To request a vnode on a host with any number of associated accelerators you can include the following in the job s select statement accelerator True PBS Professional 13 0 Beta User s Guide UG 267 Chapter 11 Submitting Cray Jobs 11 5 1
125. array returned an exit status of 0 No PBS error occurred Deleted subjobs are not considered 1 At least subjob returned a non zero exit status No PBS error occurred 2 A PBS error occurred UG 202 PBS Professional 13 0 Beta User s Guide Job Arrays Chapter 8 8 4 6 1 Making qsub Wait Until Job Array Finishes Blocking qsub waits until the entire job array is complete then returns the exit status of the job array 8 4 7 Caveats for Submitting Job Arrays 8 4 7 1 No Interactive Job Submission of Job Arrays Interactive submission of job arrays is not allowed 8 5 Viewing Status of a Job Array You can use the qstat command to query the status of a job array The default output is to list the job array in a single line showing the job array Identifier You can combine options To show the state of all running subjobs use t r To show the state of subjobs only not job arrays use t J Table 8 4 Job Array and Subjob Options to qstat Optio p Result n t Shows state of job array object and subjobs Also shows state of jobs J Shows state only of job arrays p Prints the default display with column for Percentage Completed For a job array this is the number of subjobs completed or deleted divided by the total number of subjobs For a job it is time used divided by time requested 8 5 1 Example of Viewing Job Array Status We run an example job and an example job array on a
126. at the queue or server level apply to an entire job Resources defined at the vnode level apply only to the part of the job running on that vnode Jobs can request resources The scheduler matches requested resources with available resources according to rules defined by the administrator PBS always places jobs where it finds the resources requested by the job PBS will not place a job where that job would use UG 58 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 more resources than PBS thinks are available For example if you have two jobs each requesting 1 CPU and you have one vnode with 1 CPU PBS will run only one job at a time on the vnode PBS can enforce limits on resource usage by jobs see section 4 5 Limits on Resource Usage on page 73 4 2 2 Glossary Chunk A set of resources allocated as a unit to ajob Specified inside a selection directive All parts of a chunk come from the same host In a typical MPI Message Passing Interface job there is one chunk per MPI process Chunk level resource host level resource A resource that is available at the host level for example CPUs or memory Chunk resources are requested inside of a selection statement The resources of a chunk are to be applied to the portion of the job running in that chunk Chunk resources are requested inside a select statement Job wide resource server resource queue resource A job wide res
127. aths By default PBS names the output and error files for your job using the job name and the job s sequence number The output file name is specified in the Output_Path job attribute and the error file name is specified in the Error_Path job attribute The default output filename has this format lt job name gt o lt sequence number gt The default error filename has this format lt job name gt e lt sequence number gt The job name if not specified defaults to the script name For example if the job ID is 1234 exampleserver and the script name is myscript the error file is named myscript e1234 If you specify a name for your job the script name is replaced with the job name For example if you name your job fixgamma the output file is named fix gamma o1234 For details on naming your job see section 2 5 2 Specifying Job Name on page 30 3 3 2 2 Specifying Paths You can specify the path and name for the output and error files for each job by setting the value for the Output_Path and Error_Path job attributes You can set these attributes using the following methods e Usethe o lt output path gt and e lt error path gt options to qsub e Use PBS Output_Path lt path gt and PBS Error Path lt path gt directives in the job script The path argument has the following form hostname path_name where hostname is the name of a host and path_name is the path name on that host UG 50 PBS Professio
128. ator can exchange any jobs e Ajob in the running state cannot be reordered e The qorder command can be used with entire job arrays but not on subjobs or ranges Reordering a job array changes the queue order of the job array in relation to other jobs or job arrays in the queue 9 7 Moving Jobs Between Queues PBS provides the qmove command to move jobs between different queues even queues on different servers To move a job is to remove the job from the queue in which it resides and instantiate the job in another queue A job in the running state cannot be moved UG 222 PBS Professional 13 0 Beta User s Guide Working with PBS Jobs Chapter 9 The usage syntax of the qmove command is qmove destination job_identifier s Job array job_identifiers must be enclosed in double quotes The first operand is the new destination for queue server queue server If the destination operand describes only a queue then qmove will move jobs into the queue of the specified name at the job s current server If the destination operand describes only a server then qmove will move jobs into the default queue at that server If the destination operand describes both a queue and a server then qmove will move the jobs into the speci fied queue at the specified server All following operands are job_identifiers which specify the jobs to be moved to the new destination To move jobs between queues or between servers using xpbs select the job
129. ax lnodes 1 property is converted to the equivalent l select 1 ncpus 1 property True l place scatter Request 2 CPUs on each of four hosts with a given property using old syntax lnodes 4 property ncpus 2 is converted to the equivalent l select 4 ncpus 2 property True PBS Professional 13 0 Beta User s Guide UG 89 Chapter 4 Allocating Resources amp Placing Jobs l place scatter 5 Request 1 CPU on each of 14 hosts asking for certain software licenses and a job limit amount of memory using old syntax lnodes 14 mpi fluent ncpus 1 lfluent 1 fluent all 1 fluent par 13 1 mem 280mb is converted to the equivalent l select 14 ncpus 1 mem 20mb mpi_fluent True l place scatter l fluent 1 fluent all 1 fluent par 13 Requesting licenses using old syntax lnodes 3 dyna mpi Linux ncpus 2 ldyna 6 mem 100mb software dyna is converted to the equivalent l select 3 ncpus 2 mem 33mb dyna mpi Linux True l place scatter l software dyna l dyna 6 Requesting licenses using old syntax l ncpus 2 app lic 6 mem 200mb 1 software app is converted to the equivalent l select 1 ncpus 2 mem 200mb l place pack l software app l app lic 6 Additional example using old syntax lnodes 1 fserver 15 noserver is converted to the equivalent l select 1 ncpus 1 fserver True 15 ncpus 1 noserver True l place scatter but could also be more easily specified with something like l select 1 ncpus
130. bdir attribute is a read only attribute set to the pathname of the job s staging and execution directory on the primary host You can view this attribute by using qstat f only while the job is executing The value of jobdir is not retained if a job is rerun it is unde fined whether jobdir is visible or not when the job is not executing The environment variable PBS_JOBDIR is set to the pathname of the staging and execution directory on the primary execution host PBS_JOBDIR is added to the job script process any job tasks and the prologue and epilogue PBS Professional 13 0 Beta User s Guide UG 37 Chapter 3 3 2 3 Job Input amp Output Files Attributes and Environment Variables Affecting Staging The following attributes and environment variables affect staging and execution Table 3 3 Attributes and Environment Variables Affecting Staging Job s Attribute or Environment Variable Effect sandbox attribute Determines whether PBS uses user s home directory or cre ates job specific directory for staging and execution User settable per job via qsub W or through a PBS directive stagein attribute Sets list of files or directories to be staged in User settable per job viaqsub W or through a PBS directive stageout attribute Sets list of files or directories to be staged out User settable per job viaqsub W or through a PBS directive Keep_Files attribute Determines whether ou
131. bstate qhold Job is not held qrerun Job is not requeued qmove Cannot be used on a job that is provisioning qalter Cannot be used on a job that is provisioning qrun Cannot be used on a job that is provisioning 12 4 3 How Provisioning Affects Jobs A job that has requested an AOE will not preempt another job Therefore no job will be termi nated in order to run a job with a requested AOE A job that has requested an AOE will not be backfilled around 12 5 Caveats and Errors 12 5 1 Requested Job AOE and Reservation AOE Should Match Do not submit jobs that request an AOE to a reservation that does not request the same AOE Reserved vnodes may not supply that AOE your job will not run 12 5 2 Allow Enough Time in Reservations If a job is submitted to a reservation with a duration close to the walltime of the job provi sioning could cause the job to be terminated before it finishes running or to be prevented from starting If a reservation is designed to take jobs requesting an AOE leave enough extra time in the reservation for provisioning UG 284 PBS Professional 13 0 Beta User s Guide Using Provisioning Chapter 12 12 5 3 Requesting Multiple AOEs For a Job or Reservation Do not request more than one AOE per job or reservation The job will not run or the reser vation will remain unconfirmed 12 5 4 Held and Requeued Jobs The job is held with a system hold for the following reasons e Provisioning fails
132. by the administrator you must use the same resource string values as the ones set up by the administrator 012 is not the same as 102 or 201 For example when requesting a resource that allows you to request NUMA nodes 0 and 1 and the administrator used the string 07 you must request lt resource name gt 01 If you request lt resource name gt 10 this will not work 11 7 3 Avoid Invalid Cray Requests It is possible to create a select and place statement that meets the requirements of PBS but not of the Cray Example 11 15 The Cray width and depth values cannot be calculated from ncpus and mpiprocs values For example if ncpus is 2 and mpiprocs is 4 the depth value is cal culated by dividing ncpus by mpiprocs and is one half This is not a valid depth value for Cray Example 11 16 ALPS cannot run jobs with some complex select statements In particular a multiple program multiple data MPMD ALPS reservation where two groups span a compute node will produce an ALPS error because the nid shows up in two Reserve Param sections 11 7 4 Visibility of Jobs Launched from Login Nodes Jobs that requested a vntype of cray_compute that were launched from an internal login node are not listed in the jobs attribute of the internal login node PBS Professional 13 0 Beta User s Guide UG 273 Chapter 11 Submitting Cray Jobs 11 7 5 Resource Restrictions and Deprecations 11 7 5 1 Restriction on Translation of mpp Res
133. ce List max_walltime 10 00 00 Resource List min_walltime 00 00 10 6 3 5 2 Viewing walltime for a Shrink to fit Job PBS sets a job s walltime only when the job runs While the job is running you can see its walltime via qstat f While the job is not running you cannot see its real walltime it may have a value set for walltime but this value is ignored You can see the walltime value for a finished shrink to fit job if you are preserving job his tory See section 11 16 Managing Job History on page 1005 6 3 6 Lifecycle of a Shrink to fit Job 6 3 6 1 Execution of Shrink to fit Jobs Shrink to fit jobs are started just like non shrink to fit jobs 6 3 6 2 Termination of Shrink to fit Jobs When a shrink to fit job exceeds the walltime PBS has set for it it is killed by PBS exactly as a non shrink to fit job is killed when it exceeds its walltime 6 3 7 The min_walltime and max_walltime Resources max_walltime Maximum walltime allowed for a shrink to fit job Job s actual walltime is between max_walltime and min_walltime PBS sets walltime for a shrink to fit job If this resource is specified min_walltime must also be specified Must be greater than or equal to min_walltime Cannot be used for resources_min or resources_max Cannot be set on job arrays or reservations If not specified PBS uses 5 years as the maximum time slot Can be requested only outside of a select statement Non con sumable Default None Type
134. ce List mppwidth 8544 Resource List ncpus 8544 Resource List place free Resource List select 8544 vntype cray_ compute Submit_arguments lmppwidth 8544 job Scheduling took 6 seconds 12 05 2011 16 46 10 0080 pbs_ sched Job 23 example considering job to run 12 05 2011 16 46 16 0040 pbs_ sched Job 23 example Job run Submit job with chunk size 8 and 1068 chunks qsub lmppwidth 8544 mppnppn 8 job Job s Resource_List Resource List mpiprocs 8544 Resource List mppnppn 8 Resource List mppwidth 8544 Resource List ncpus 8544 Resource List place scatter Resource List select 1068 ncpus 8 mpiprocs 8 vntype cray_compute Scheduling took 1 second 12 05 2011 16 54 38 0080 pbs sched Job 24 example Considering job to run 12 05 2011 16 54 39 0040 pbs_ sched Job 24 example Job run If you are on a heterogeneous system with varying sizes for vnodes or compute nodes you can request chunk sizes that fit available hardware but this may not be feasible UG 276 PBS Professional 13 0 Beta User s Guide Submitting Cray Jobs Chapter 11 11 8 Errors and Logging 11 8 1 Invalid Cray Requests When a select statement does not meet Cray requirements and the Cray reservation fails the following error message is printed in MoM s log at log event class 0x080 Fatal MPP reservation error preparing request 11 8 2 Job Requests More Than Available If do_not_span_psets is set to True and a job requests more resou
135. ce_List mem shows 200mb PBS Professional 13 0 Beta User s Guide UG 75 Chapter 4 Allocating Resources amp Placing Jobs 4 6 Viewing Resources You can look at the resources on the server queue and vnodes You can also see what resources are allocated to and used by your job 4 6 1 Viewing Server Queue and Vnode Resources To see server resources qstat Bf To see queue resources qstat Qf To see vnode resources use any of the following qmgr c list node lt vnode name gt lt attribute name gt pbsnodes av pbsnodes host list Look at the following attributes resources_available lt resource name gt Server queue vnode Total amount of the resource available at the server queue or vnode does not take into account how much of the resource is in use resources_default lt resource name gt Server queue Default value for job wide resource This amount is allocated to job if job does not request this resource Queue setting overrides server setting resources_max lt resource name gt Server queue Maximum amount that a single job can request Queue setting over rides server setting resources_min lt resource name gt Queue Minimum amount that a single job can request resources_assigned lt resource name gt Server queue vnode Total amount of the resource that has been allocated to run ning jobs and reservations at the server queue or vnode 4 6 2 Viewing Job Resources To see the re
136. cesses and one thread per process PBS 1l select 64 ncpus 1 mpiexec n 64 a out Example 5 45 Run an MPI application with 64 MPI processes and four OpenMP threads per process PBS 1l select 64 ncpus 4 mpiexec n 64 omplace nt 4 a out or PBS 1l select 64 ncpus 4 ompthreads 4 mpiexec n 64 omplace nt 4 a out UG 140 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 5 4 1 Running Fewer Threads than CPUs You might be running an OpenMP application on a host and wish to run fewer threads than the number of CPUs requested This might be because the threads need exclusive access to shared resources in a multi core processor system such as to a cache shared between cores or to the memory shared between cores Example 5 46 You want one chunk with 16 CPUs and eight threads qsub 1 select 1 ncpus 16 ompthreads 8 5 4 2 Running More Threads than CPUs You might be running an OpenMP application on a host and wish to run more threads than the number of CPUs requested perhaps because each thread is I O bound Example 5 47 You want one chunk with eight CPUs and 16 threads qsub 1 select 1 ncpus 8 ompthreads 16 5 4 3 Caveats for Using OpenMP with PBS Make sure that you request the correct number of MPI ranks for your job so that the PBS node file contains the correct number of entries See section 5 1 3 Specifying Number of MPI Processes Per Chunk on page 95 5 5 Hybrid MPl OpenMP J
137. ckward keys A spinbox is a combination of an entry widget and a horizontal scrollbar The entry widget will only accept values that fall within a defined list of valid values and incrementing through the valid values is done by clicking on the up down arrows UG 290 PBS Professional 13 0 Beta User s Guide Using the xpbs GUI Chapter 14 A button is a rectangular region appearing either raised or pressed that invokes an action when clicked with the left mouse button When the button appears pressed then hitting the lt RETURN gt key will automatically select the button A text region is an editor like widget This widget is brought into focus with a left click To manipulate this widget simply type in the text Use of arrow keys backspace delete key mouse selection of text for deletion or overwrite and copying and pasting with sole use of mouse buttons are permitted This widget has a scrollbar for vertically scanning a long entry 14 3 Introducing the xpbs Main Display The main window or display of xpbs is comprised of five collapsible subwindows or panels Each panel contains specific information Top to bottom these panels are the Menu Bar Hosts panel Queues panel Jobs panel and the Info panel 14 3 1 xpbs Menu Bar The Menu Bar is composed of a row of command buttons that signal some action with a click of the left mouse button The buttons are Manual Update forces an update of the information on hosts queues and
138. correctly and is marked executable This could be done by naming different commands with the subjob index as in your example or by if statements in the script 8 3 2 Attributes and Resources for Job Arrays All subjobs have the same attributes including resource requirements and limits The same job script runs for each subjob so each subjob gets the same attributes and resources Ifthe job script calls other scripts or commands those scripts or commands cannot change the attributes and resources for individual subjobs because PBS stops processing directives when it starts processing commands 8 3 3 Scheduling Job Arrays and Subjobs The scheduler handles each subjob in a job array as a separate job All subjobs within a job array have the same scheduling priority 8 3 3 1 Starving A job array s starving status is based on the queued portion of the array This means that if there is a queued subjob which is starving the job array is starving A running subjob retains the starving status it had when it was started UG 192 PBS Professional 13 0 Beta User s Guide Job Arrays Chapter 8 8 3 4 identifier Syntax The sequence number 1234 in 1234 server is unique so that jobs and job arrays cannot share a sequence number The job identifiers of the subjobs in the same job array are the same except for their indices Each subjob has a unique index You can refer to job arrays or parts of job arrays using the following syntax for
139. ctHold The hold types string to look for in a job when limiting the jobs appearing on the Jobs listbox in the main xpbs window Specify value as Hold_Types lt hold_string gt See h option in qselect on page 192 of the PBS Professional Ref erence Guide for format of lt hold_string gt selectPriority The priority relationship including the logical operator to consult when limiting the list of jobs appearing on the Jobs listbox in the main xpbs window Specify value as Priority lt priority_value gt See p option in qselect on page 192 of the PBS Pro fessional Reference Guide for format of lt priority_value gt selectRerun The Rerunable attribute to consult when limiting the list of jobs appearing on the Jobs listbox in the main xpbs window Specify value as Rerunnable lt rerun_val gt See r option in qselect on page 192 of the PBS Professional Reference Guide for format of lt rerun_val gt PBS Professional 13 0 Beta User s Guide UG 305 Chapter 14 Using the xpbs GUI selectJobName Name of the job that will be checked when limiting the jobs appearing on the Jobs listbox in the main xpbs window Specify value as Job Name lt jobname gt See N option in qselect on page 192 of the PBS Professional Reference Guide for for mat of lt jobname gt iconizeHostsView A boolean value True or False indicating whether or not to iconize the HOSTS region iconizeQueuesView
140. d stderr on page 49 If you do not want stdout and or stderr you can redirect them to dev nu11 within the job script For example to redirect stdout and stderr to dev nu1l exec gt amp dev null 1 gt amp 2 PBS Professional 13 0 Beta User s Guide UG 51 Chapter 3 Job Input amp Output Files 3 3 4 Merging Output and Error Files By default PBS creates separate standard output and standard error files for each job You can specify that stdout and stderr are to be joined by setting the job s Join_Path attribute The default for the attribute is N meaning that no joining takes place You can set the attribute using the following methods e Useqsub j lt joining option gt e Use PBS Join Path lt joining option gt You can specify one of the following joining options oe Standard output and standard error are merged intermixed into a single stream which becomes standard output eo Standard output and standard error are merged intermixed into a single stream which becomes standard error Standard output and standard error are not merged For example to merge standard output and standard error for my_job into standard output qsub j oe my_job PBS j oe 3 3 5 Keeping Output and Error Files on Execution Host By default PBS copies stdout and stderr to the job s submission directory You can specify that PBS keeps stdout stderr or both in the job s execution directory on the exe cution host This
141. d to request a custom resource which has been created to be invisible or unrequestable See section 4 3 8 Caveats and Restrictions on Requesting Resources on page 67 7 5 3 Viewing Status of a Job Submitted to a Reservation You can view the status of a job that has been submitted to a reservation or to an occurrence of a standing reservation by using the qstat command See gstat on page 204 of the PBS Professional Reference Guide PBS Professional 13 0 Beta User s Guide UG 185 Chapter 7 Reserving Resources Ahead of Time For example if a job named MyJob has been submitted to the soonest occurrence of the standing reservation named 304 south it is listed under 304 the name of the queue qstat Job id Name User Time Use S Queue 139 south MyJob el Ee 7 5 4 How Reservations Treat Jobs A confirmed reservation will accept jobs into its queue at any time Jobs are only scheduled to run from the reservation once the reservation period arrives The jobs in a reservation are not allowed to use in aggregate more resources than the reserva tion requested A reservation job is accepted in the reservation only if its requested walltime will fit within the reservation period So for example if the reservation runs from 10 00 to 11 00 and the job s walltime is 4 hours the job will not be started When an advance reservation ends any running or queued jobs in that reservation are deleted When an occurrence of a s
142. d by double clicking on an entry in the Hosts listbox submit For submitting a job to any of the queues managed by the selected host s terminate For terminating shutting down PBS servers on selected host s Visible via the admin option only Note that some buttons are only visible if xpbs is started with the admin option which requires manager or operator privilege to function The middle portion of the Hosts Panel has abbreviated column names indicating the informa tion being displayed as the following table shows Table 14 1 xpbs Server Column Headings Heading Meaning Max Maximum number of jobs permitted Tot Count of jobs currently enqueued in any state Que Count of jobs in the Queued state Run Count of jobs in the Running state Hld Count of jobs in the Held state Wat Count of jobs in the Waiting state Tm Count of jobs in the Transiting state Ext Count of jobs in the Exiting state Status Status of the corresponding server PEsInUse Count of Processing Elements CPUs PEs Vnodes in Use PBS Professional 13 0 Beta User s Guide UG 293 Chapter 14 Using the xpbs GUI 14 3 3 xpbs Queues Panel The Queues panel is composed of a leading horizontal QUEUES bar a listbox and a set of command buttons The QUEUES bar lists the hosts that are consulted when listing queues the bar also contains a minimize maximize button for displaying or iconizing the Queues panel The listb
143. d remove that state from the selection criteria Manual Update Auto Update Track Job Preferences Help About Close Select All f detail QUEUES Listed By Host s dhcp115 ne Queue Select All i detail OBS Listed By Queve s slow pbspro com el Other Criteria Select Jobs Job id Select all detail modify delete hold release af Select Job States Criteria Bimar msg i move H Job State MATCH ER EQ EVU EH WEE WT ORUE INFO 03708705 18 02 01 usk ok help done xpbs_datadump alarn 03 08 05 19 17 38 xt You may specify as many or as few selection criteria as you wish When you have completed your selection click on the Select Jobs button above the HOSTS panel to have xpbs refresh the display with the jobs that match your selection criteria The selected criteria will remain in effect until you change them again If you exit xpbs you will be prompted if you wish to save your configuration information this includes the job selection criteria PBS Professional 13 0 Beta User s Guide UG 247 Chapter 10 Checking Job amp System Status 10 6 Tracking Job Progress Using xpbs TrackJob Feature The xpbs command includes a feature that allows you to track the progress of your jobs When you enable the Track Job feature xpbs will monitor your jobs looking for the output files
144. d with qsub l ncpus 4 mem 123mb arch linux gets the following select statement select 1 ncpus 4 mem 123mb arch linux 4 8 3 2 Conversion of Node Specifications If your job requests a node specification PBS creates a select and place specification accord ing to the following rules Old node specification format Inodes N spec_list spec_list N spec_list spec_list suffix Incpus Z where spec_list has syntax spec spec spec is any of hostname property ncpus X cpp X ppn P suffix is any of property excl shared N and P are positive integers X and Z are non negative integers The node specification is converted into select and place statements as follows Each spec_list is converted into one chunk so that N spec_list is converted into N chunks If spec is hostname The chunk will include host hostname If spec matches any vnode s resources_available host value The chunk will include host hostname If spec is property The chunk will include property true Property must be a site defined vnode level boolean resource If spec is ncpus X or cpp X The chunk will include ncpus X PBS Professional 13 0 Beta User s Guide UG 87 Chapter 4 Allocating Resources amp Placing Jobs If no spec is ncpus X and no spec is cpp X The chunk will include ncpus P If spec is ppn P The chunk will include mpiprocs P If the nodespec is l1nodes N ppn P It is converted to lselect N ncp
145. default 4 5 Limits on Resource Usage Jobs are assigned limits on the amount of resources they can use These limits apply to how much the whole job can use job wide limit and to how much the job can use at each host host limit Limits are applied only to resources the job requests or inherits Your administrator can configure PBS to enforce limits on mem and ncpus but the other limits are always enforced If you want to make sure that your job does not exceed a given amount of some resource request that amount of the resource 4 5 1 Enforceable Resource Limits Limits can be enforced on the following resources Table 4 1 Enforceable Resource Limits Resc r e Name Where Where Always Specified Enforced Enforced cput Host Host Always mem Host Host Optional ncpus Host Host Optional pcput Job wide Per process Always pmem Job wide Per process Always PBS Professional 13 0 Beta User s Guide UG 73 Chapter 4 Allocating Resources amp Placing Jobs Table 4 1 Enforceable Resource Limits Recourse Name Where Where Always Specified Enforced Enforced pymem Job wide Per process Always vmem Host Host Always walltime Job wide Job wide Always 4 5 2 Origins of Resource Limits Limits are derived from both requested resources and applied default resources Resource limits are derived in the order shown in section 4 4 1 Applying Default Resources on page 70
146. des that already have shared jobs on them request sharing in the job resource requests The alt_id job attribute has the form cpuset lt name gt where lt name gt is the name of the cpuset which is the PBS_JOBID To verify how many CPUs are included in a cpuset created by PBS use gt cpuset d lt set name gt egrep cpus This will work either inside or outside a job For details on shared versus exclusive use of vnodes see section 4 7 1 2 Specifying Shared or Exclusive Use of Vnodes on page 79 and for a description of how the vnode sharing attribute interacts with a job s resource request see sharing on page 380 of the PBS Profes sional Reference Guide PBS Professional 13 0 Beta User s Guide UG 137 Chapter 5 Multiprocessor Jobs 5 2 16 4 Fitting Jobs onto Nodeboards PBS will try to put a job that fits in a single nodeboard on just one nodeboard However if the only CPUs available are on separate nodeboards and those vnodes are not allocated exclu sively to existing jobs and the job can share a vnode then the job is run on the separate node boards 5 2 16 5 Checkpointing and Suspending Jobs Jobs are suspended on the Altix using the PBS suspend feature If a job is suspended its pro cesses are moved to the global cpuset When the job is restarted they are restored Jobs are checkpointed on the Altix using application level checkpointing There is no OS level checkpoint Suspended or checkp
147. detected Once the problem has been resolved the job owner or a PBS Operator may remove the wait by resetting the time after which the job is eligible to be run via the a option to qalter The server will update the job s comment with information about why the job was put in the wait state When the job is eligible to run it may run on different vnodes PBS Professional 13 0 Beta User s Guide UG 47 Chapter 3 Job Input amp Output Files 3 2 11 2 File Stageout Failure When stageout encounters an error there are three retries PBS waits 1 second and tries again then waits 11 seconds and tries a third time then finally waits another 21 seconds and tries a fourth time Email is sent to the job owner if all attempts fail Files that cannot be staged out are saved in PBS_HOME undelivered See section 3 3 7 1 Non delivery of Output on page 54 3 3 Managing Output and Error Files 3 3 1 Default Behavior By default PBS copies the standard output stdout and standard error stderr files back to PBS_O_WORKDIR on the submission host when a job finishes When qsub is run it sets PBS_O_WORKDIR to the current working directory where the qsub command is executed This means that if you want your job s stdout and stderr files to be delivered to your submission directory you do not need to do anything Four options to the qsub command control where stdout and stderr are created and whether and where they are copied when the job is f
148. ding reservations A standing reservation is a series of advance reservations The pbs_rsub command is used to create both advance and standing reservations See Chapter 7 Reserving Resources Ahead of Time on page 173 1 2 Deprecations For a list of deprecations see Deprecations and Removals on page 12 in the PBS Profes sional Administrator s Guide 1 3 Backward Compatibility 1 3 1 Job Dependencies Affected By Job History Enabling job history changes the behavior of dependent jobs If a job j1 depends on a finished job j2 for which PBS is maintaining history than j1 will go into the held state If job j1 depends on a finished job j3 that has been purged from the historical records than j1 will be rejected just as in previous versions of PBS where the job was no longer in the system 1 3 2 PBS path information no longer saved in AUTOEXEC BAT Any value for PATH saved in AUTOEXEC BAT may be lost after installation of PBS If there is any path information that needs to be saved AUTOEXEC BAT must be edited by hand after the installation of PBS PBS path information is no longer saved in AUTOEXEC BAT PBS Professional 13 0 Beta User s Guide UG 5 Chapter 1 New Features 1 3 3 Submitting Jobs with Old Syntax For instructions on submitting jobs using old syntax see section 4 8 Backward Compatibil ity on page 86 UG 6 PBS Professional 13 0 Beta User s Guide Submitting a PBS Job 2 1 Introduction to the PBS Job
149. due to invalid provisioning request or to internal system error e After provisioning the AOE reported by the vnode does not match the AOE requested by the job The hold can be released by the PBS Administrator after investigating what went wrong and correcting the mistake The job is requeued for the following reasons e The provisioning hook fails due to timeout e The vnode is not reported back up 12 5 5 Conflicting Resource Requests The values of the resources arch and vnode may be changed by provisioning Do not request an AOE and either arch or vnode for the same job 12 5 6 Job Submission and Alteration Have Same Requirements Whether you use the qsub command to submit a job or the qalter command to alter a job the job must eventually meet the same requirements You cannot submit a job that meets the requirements then alter it so that it does not PBS Professional 13 0 Beta User s Guide UG 285 Chapter 12 Using Provisioning UG 286 PBS Professional 13 0 Beta User s Guide 13 Special Circumstances and Tools 13 1 Support for Large Page Mode on AIX A process running as part of a job can use large pages The memory reported in resources_used mem may be larger with large page sizes You can set an environment variable to request large memory pages LDR_CNTRL LARGE PAGE DATA M LDR_CNTRL LARGE PAGE DATA Y For more information see the man page for setpcred This can be viewed with the com mand man setpcred
150. duration Python type pbs duration UG 152 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs Chapter 6 min_walltime Minimum walltime allowed for a shrink to fit job When this resource is specified job is a shrink to fit job If this attribute is set PBS sets the job s walltime Job s actual walltime is between max_walltime and min_walltime Must be less than or equal to max_walltime Cannot be used for resources_min or resources_max Cannot be set on job arrays or reservations Can be requested only outside of a select statement Non consumable Default None Type duration Python type pbs duration 6 3 8 Caveats and Restrictions for Shrink to fit Jobs It is erroneous to specify max_walltime for a job without specifying min_walltime If attempted via qsub or qalter the following error is printed Can not have max _walltime without min _walltime It is erroneous to specify a min_walltime that is greater than max_walltime If attempted via qsub or qalter the following error is printed min walltime can not be greater than max _walltime Job arrays cannot be shrink to fit You cannot have a shrink to fit job array It is erroneous to specify a min_walltime or max_walltime for a job array If attempted via qsub or qalter the following error is printed min walltime and max walltime are not valid resources for a job array Reservations cannot be shrink to fit You
151. e reps 3 echo done 5 2 8 3 i Caveats for MPICH Under Windows Under Windows MPICH is not integrated with PBS Therefore PBS is limited to tracking and controlling processes and performing accounting only for job processes on the primary vnode PBS Professional 13 0 Beta User s Guide UG 119 Chapter 5 Multiprocessor Jobs 5 2 9 MPICH GM with PBS 5 2 9 1 Using MPICH GM and MPD with PBS PBS provides an interface to MPICH GM s mpirun using MPD If executed inside a PBS job this allows for PBS to track all MPICH GM processes started by the MPD daemons so that PBS can perform accounting and have complete job control If executed outside of a PBS job it behaves exactly as if standard mpirun with MPD had been used You use the same mpirun command as you would use outside of PBS Ifthe MPD daemons are not already running the PBS interface will take care of starting them for you 5 2 9 1 i Options Inside a PBS job script all of the options to the PBS interface are the same as mpirun with MPD except for the following m lt file gt The file argument contents are ignored and replaced by the contents of PBS_NODEFILE np If not specified the number of entries found in PBS_NODEFILE is used The maximum number of ranks that can be launched is the number of entries in PBS_NODEFILE The use of the pg option for having multiple executables on multiple hosts is allowed but it is up to you to make sure only PBS hosts are specif
152. e NUMA nodes are assigned to the job Example 11 9 To request 8 PES with 4 PEs per NUMA node the aprun statement is the following aprun S 4 n 8 The equivalent select statement is the following not including the scatter by vnode and exclusive by host placement language qsub lselect 2 ncpus 4 mpiprocs 4 11 5 1 1 Caveats For aprun S When you use aprun S you must request mpiprocs and request the same value as for ncpus 11 5 2 Reserving N NUMA Nodes Per Compute Node The Cray aprun sn option allows you to specify the number of NUMA nodes per com pute node for your job PBS allows you to make the equivalent request using select and place statements To request N NUMA nodes per compute node you place your job by requesting a resource that specifies the number of of NUMA nodes per compute node This resource is set up by your administrator We suggest that the resource is named craysn and the value you specify is the number of vnodes per compute node For example to request 2 segments per compute node specify a value of 2 for craysn To make a request equivalent to aprun sn 3 n 24 and match the compute node exclu sive behavior of the Cray you can specify the following qsub lselect 24 ncpus 1 craysn 3 lplace exclhost PBS Professional 13 0 Beta User s Guide UG 263 Chapter 11 Submitting Cray Jobs 11 5 3 Reserving Specific NUMA Nodes on Each Compute Node The Cray aprun s1 option allows you to reserve sp
153. e a select statement This old style of resource request was called a resource specifica tion Resource specification syntax is deprecated For backward compatibility any resource specification is converted to select and place state ments and any defaults are applied 4 8 2 Old style Node Specifications In early versions of PBS job submitters used 1 nodes in what was called a node specification to specify where the job should run The syntax for a node specification is deprecated For backward compatibility a legal node specification or resource specification is converted into select and place directives we show how in following sections 4 8 3 Conversion of Old Style to New 4 8 3 1 Conversion of Resource Specifications If your job has an old style resource specification PBS creates a select specification request ing chunk containing the resources specified by the job and server and or queue default resources Resource specification format lresource value resource value The resource specification is converted to lselect 1 resource value Iplace pack with one instance of resource value for each of the following vnode level resources in the resource request built in resources nepus mem vmem arch host site defined vnode level resources UG 86 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 For example a job submitte
154. e is set to the pathname of the job spe cific temporary scratch directory 3 2 8 4 Staging Files Into Staging and Execution Directories PBS evaluates execution_path and storage_path relative to the staging and execu tion directory given in PBS_JOBDIR whether this directory is your home directory or a job specific directory created by PBS PBS copies the specified files and or directories to the job s staging and execution directory 3 2 8 5 Running the Prologue The MoM s prologue is run on the primary host as root with the current working directory set to PBS_HOME mom_priv and with PBS_JOBDIR and TMPDIR set in its environment PBS Professional 13 0 Beta User s Guide UG 45 Chapter 3 Job Input amp Output Files 3 2 8 6 Job Execution PBS runs the job script on the primary host as you PBS also runs any tasks created by the job as you The job script and tasks are executed with their current working directory set to the job s staging and execution directory and with PBS_JOBDIR and TMPDIR set in their envi ronment 3 2 8 7 Standard Out and Standard Error The job s stdout and stderr files are created directly in the job s staging and execution directory on the primary execution host 3 2 8 7 i Job specific Staging and Execution Directories Ifthe qsub k option is used the stdout and stderr files will not be automatically cop ied out of the staging and execution directory at job end they will be deleted when the direc
155. e of the reservation is specified using the r option to pbs_rsub The r option takes the recurrence rule argument which specifies the standing reservation s occurrences The recurrence rule uses iCalendar syntax and uses a subset of the parameters described in RFC 2445 The recurrence rule can take two forms FREQ freq_spec COUNT count_spec interval_spec In this form you specify how often there will be occurrences how many there will be and which days and or hours apply FREQ freq_spec UNTIL until_spec interval_spec Do not include any spaces in your recurrence rule In this form you specify how often there will be occurrences when the occurrences will end and which days and or hours apply freq_spec This is the frequency with which the reservation repeats Valid values are WEEKLY DAILY HOURLY When using a freq_spec of WEEKLY you may use an interval_spec of BYDAY and or BYHOUR When using a freq_spec of DAILY you may use an interval_spec of BYHOUR When using a freq_spec of HOURLY do not use an interval_spec count_spec The exact number of occurrences Number up to 4 digits in length Format integer interval_spec Specifies the interval at which there will be occurrences Can be one or both of BYDAY lt days gt or BYHOUR lt hours gt Valid values are BYDAY MO TU WE TH FR SA SU and BYHOUR 0 1 2 23 When using both separate them with a semicolon Separate days or hours with a comma For
156. e output on ViewHost If you want to receive X output on a host other than the submission host do the following e Run an X server on ViewHost e On ViewHost log into SubHost using ssh X e In window logged into SubHost run qsub I X 6 11 9 2 Requirements for Receiving X Output e You must be running UNIX or Linux e The job must be interactive you must also specify I e An X server must be running on the system where you want to see the X output e The DISPLAY variable in the job s submission environment must be set to the display where the X output is desired e Your administrator must configure MoM s PATH to include the xauth utility 6 11 9 3 Viewing X Output Job Attributes Each job has two read only attributes containing X forwarding information These are the fol lowing forward_x11_cookie This attribute contains the X authorization cookie forward_x11_port This attribute contains the number of the port being listened to by the port forwarder on the submission host You can view these attributes using qstat f lt job ID gt PBS Professional 13 0 Beta User s Guide UG 169 Chapter 6 Controlling How Your Job Runs 6 11 9 4 Caveats and Advice for Receiving X Output e This option is not available under Windows e Ifyouuse the qsub V option PBS will handle the DISPLAY variable correctly e Ifyouuse the qgsub v DISPLAY option you will get an error e At most 25 concurrent X applications can run using the sa
157. e value If a variable value pair contains any commas the value must be enclosed in single or double quotes and the variable value pair must be enclosed in the kind of quotes not used to enclose the value For example qsub v DISPLAY myvariable 32 my_job qsub v varl A B C D job sh qsub v a 10 var2 A B c 20 HOME home zzz job sh PBS Professional 13 0 Beta User s Guide UG 171 Chapter 6 Controlling How Your Job Runs 6 11 10 3 Caveat for Environment Variables and Shell Functions Make sure that no exported shell function you want to forward has the same name as an envi ronment variable The shell function will not be visible in the environment 6 11 11 Forwarding Exported Shell Functions You can forward exported shell functions using either qsub V or qsub v lt function name gt You can also put these functions in your profile or login on the execution host s If you use v or V make sure that there is no environment variable with the same name as any exported shell functions you want to forward otherwise the shell function will not be visible in the environment 6 11 12 Caveat for Interactive Jobs and File I O When an interactive job finishes staged files and stdout and or stderr may not have been copied back yet 6 12 Specifying Which Jobs to Preempt You can specify which groups of jobs your job is allowed to preempt in order to run You can specify all the jobs in one or more queues and all jobs
158. ec arguments to mpirun PBS_NODEFILE hostA hostA hostB hostB hostC hostC Job script mpirun runs MPD daemons on hosts listed in PBS NODEFILE mpirun runs 2 instances of mpitestl on hostA 2 instances of mpitest2 on hostB 2 instances of mpitest3 on hostc mpirun takes care of shutting down the MPDs at the end of MPI job run mpirun np 2 tmp mpitestl np 2 tmp mpitest2 np 2 tmp mpitest3 PBS Professional 13 0 Beta User s Guide UG 115 Chapter 5 Multiprocessor Jobs Run job script qsub 1 select 3 ncpus 2 mpiprocs 2 job script lt job id gt Example 5 25 Run an Intel MPI job with multiple executables on multiple hosts via the configfile option and PBS_NODEFILE PBS_NODEFILE hostA hostA hostB hostB hostC hostC Job script echo np 2 tmp mpitest1 gt gt my config file echo np 2 tmp mpitest2 gt gt my config file echo np 2 tmp mpitest3 gt gt my config file mpirun takes care of starting the MPD daemons config file says run 2 instances of mpitestl on hostA 2 instances of mpitest2 on hostB 2 instances of mpitest3 on hostcC mpirun takes care of shutting down the MPD daemons mpirun configfile my config file cleanup m f my config file Run job script qsub 1 select 3 ncpus 2 mpiprocs 2 job script lt job id gt UG 116 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 5 2 6 5 Restrictions The maximum number
159. ecific NUMA nodes on the compute nodes your job uses PBS allows you to make the equivalent request using select and place statements How you request resources depends on the number of NUMA nodes you want per compute node and how the administrator has set up the resource that allows you to request specific compute nodes 11 5 3 1 Requesting a Single NUMA Node Per Compute Node You can request the PBScrayseg resource to request one particular NUMA node per com pute node PBS automatically creates a custom string resource called PBScrayseg and sets the value for each vnode to be the segment ordinal for the associated NUMA node See Cus tom Cray Resources on page 315 of the PBS Professional Reference Guide Example 11 10 You want 8 PEs total using only NUMA node 1 on each compute node The aprun statement is the following aprun sl 1 n 8 An equivalent resource request for a PBS job is the following qsub lselect 8 ncpus 1 PBScrayseg 1 See section 11 3 1 Built in and Custom Resources for the Cray on page 251 11 5 3 2 Requesting Multiple NUMA Nodes Per Compute Node If you want to request multiple NUMA nodes per compute node you have choices For example if your aprun statement looks like the following aprun sl 0 1 n 8 UG 264 PBS Professional 13 0 Beta User s Guide Submitting Cray Jobs Chapter 11 You can do any of the following e You can request separate chunks for each segment qsub lselect 4 ncpu
160. ed to match UNIX files that begin with period or Windows files that have the SYSTEM or HIDDEN attributes e When using the qsub command line on UNIX you must prevent the shell from expand ing wildcards For some shells you can enclose the pathnames in double quotes For some shells you can use a backslash before the wildcard e Wildcards can only be used in the source side of a staging specification This means they can be used in the storage_path specification for stagein and in the execution_path spec ification for stageout e When staging using wildcards the destination must be a directory If the destination is not a directory the result is undefined So for example when staging out all out files you must specify a directory for storage_path e Wildcards can only be used in the final path component i e the basename e When wildcards are used during stagein PBS will not automatically delete staged files at job end Note that if PBS created the staging and execution directory that directory and all its contents are deleted at job end PBS Professional 13 0 Beta User s Guide UG 41 Chapter 3 Job Input amp Output Files 3 2 6 Examples of File Staging Example 3 1 Stage out all files from the execution directory to a specific directory UNIX W stageout myworkstation user project1 casel Windows W stageout mypc E project1l casel Example 3 2 Stage out specific types of result files and disr
161. een Reservations Leave enough time between reservations for the reservations and jobs in them to clean up A job consumes resources even while it is in the E or exiting state This can take longer when large files are being staged If the job is still running when the reservation ends it may take up to two minutes to be cleaned up The reservation itself cannot finish cleaning up until its jobs are cleaned up This will delay the start time of jobs in the next reservation unless there is enough time between the reservations for cleanup 7 6 4 Cannot Mix Reservations and mpp Do not request any mpp resources in a reservation PBS mpp resources are loosely cou pled to Cray resources and those Cray resources are not completely controlled by PBS A reservation requesting mppnodes for example does not prevent ALPS from running another job on those nodes If this happens the PBS job in the reservation is prevented from running even though those resources are reserved Mixing reservations and mpp resources would lead to disappointment 7 6 5 Reservation Information in the Accounting Log The PBS server writes an accounting record for each reservation in the job accounting file The accounting record for a reservation is similar to that for a job The accounting record for any job belonging to a reservation will include the reservation ID See Accounting Log on page 431 of the PBS Professional Reference Guide 7 6 6 Reservation Fault Tol
162. egard the scratch and other tem porary files after the job terminates The result files that are interesting for this example end in dat UNIX W stageout dat myworkstation project3 data Windows W stageout dat mypc C project data Example 3 3 Stage in all files from an application data directory to a subdirectory UNIX W stagein jobarea myworkstation crashtest1 Windows W stagein jobarea mypc E crashtest1 Example 3 4 Stage in data from files and directories matching wing UNIX W stagein myworkstation 848 wing Windows W stagein mypc E flowcalc wing Example 3 5 Stage in bat and dat files to jobarea UNIX W stagein jobarea myworkstation users me crash1 at Windows W stagein jobarea myworkstation C me crash1l at UG 42 PBS Professional 13 0 Beta User s Guide Job Input amp Output Files Chapter 3 3 2 6 1 Example of Using Job specific Staging and Execution Directories In this example you want the file jay fem to be delivered to the job specific staging and execution directory given in PBS_JOBDIR by being copied from the host submithost The job script is executed in PBS_JOBDIR and jay out is staged out from PBS_JOBDIR to your home directory on the submittal host i e hostname qsub Wsandbox PRIVATE Wstagein jay fem submit host jay fem Wstage out jay out submithost jay out PBS Professional 13 0 Beta User s Guide UG 43 Chapter 3 Job Input amp
163. ent You may find it helpful to run qsub in the foreground by using the f option This can avoid stale ALPS reservations not being released 11 3 PBS Resources for the Cray 11 3 1 Built in and Custom Resources for the Cray PBS provides built in and custom resources specifically created for jobs on the Cray The custom resources are created by PBS to reflect Cray information such as segments or labels PBS also provides some built in resources for all platforms that have specific uses on the Cray PBS Professional 13 0 Beta User s Guide UG 251 Chapter 11 Submitting Cray Jobs 11 3 1 1 Built in Resources for All Platforms accelerator Indicates whether this vnode is associated with an accelerator Host level Can be requested only inside of a select statement On Cray this resource exists only when there is at least one associated accelerator On Cray this is set to True when there is at least one associated accelerator whose state is UP On Cray set to False when all associated accelerators are in state DOWN Used for requesting accelerators Format Boolean Python type bool accelerator_memory Indicates amount of memory for accelerator s associated with this vnode Host level Can be requested only inside of a select statement On Cray PBS sets this resource only on vnodes with at least one accelerator whose state is UP For Cray PBS sets this resource on the 0th NUMA node the vnode with PBScrayseg 0 and the resource is
164. entifier it takes one of the following three forms queue server queue server If you specify queue the request is for status of that queue at the default server If you use the server form the request is for status of all queues at that server If you specify a full destination identifier guewe server the request is for status of the named queue at the named server 10 3 1 Viewing Queue Information in Default Format The Q option to qstat displays the status of specified queues at the optionally specified PBS server One line of output is generated for each queue queried qstat Q Queue Max Tot Ena Str Que Run Hld Wat Trn Ext Type workg 0 10 yes yes 7 1 1 1 0 O Execution The columns show the following for each queue e Queue Queue name e Max Maximum number of jobs allowed to run concurrently in the queue e Tot Total number of jobs in the queue e Ena Whether the queue is enabled or disabled e Str Whether the queue is started or stopped e Que Number of queued jobs e Run Number of running jobs e Hid Number of held jobs e Wat Number of waiting jobs e Trn Number of jobs being moved transiting e Ext Number of exiting jobs e Type Type of queue execution or routing PBS Professional 13 0 Beta User s Guide UG 243 Chapter 10 Checking Job amp System Status 10 3 2 Viewing Queue Information in Long Format Use the long format to see the value for each queue attribute qstat Of Queue workq queue_ty
165. equoia STDIN userl 00 00 00 R workq Example 10 3 Viewing moved job e There are three servers with hostnames ServerA ServerB and ServerC e Userl submits job 123 to ServerA e After some time User moves the job to ServerB e After more time the administrator moves the job to QueueC at ServerC This means e The qstat command will show QueueC ServerC for job 123 10 1 15 3 Job History In Alternate Format You can use the H option to the qstat command to see job history for finished or moved jobs in alternate format This does not display running or queued jobs PBS Professional 13 0 Beta User s Guide UG 239 Chapter 10 Checking Job amp System Status Usage qstat H Displays information for finished or moved jobs in alternate format qstat H job identifier Displays information for that job in alternate format whether or not it is finished or moved qstat H destination Displays information for finished or moved jobs at that destination Example 10 4 Job history in alternate format qstat H Req d Req d Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time 101 S1 userl workq STDIN 5168 1 1 F 00 00 102 S1 userl Q1 S2 STDIN 12 M To see alternate format status for jobs job arrays and subjobs that are finished and moved use qstat Ht To see alternate format status for job arrays that are finished or moved use qstat HJ The H option is incompatible with the a i and r optio
166. erance If one or more vnodes allocated to an advance reservation or to the soonest occurrence of a standing reservation become unavailable the reservation s state becomes DG or RESV_DEGRADED A degraded reservation does not have all the reserved resources to run its jobs PBS attempts to reconfirm degraded reservations This means that it looks for alternate avail able vnodes on which to run the reservation The reservation s retry_time attribute lists the next time when PBS will try to reconfirm the reservation If PBS is able to reconfirm a degraded reservation the reservation s state becomes CO or RESV_CONFIRMED and the reservation s resv_nodes attribute shows the new vnodes PBS Professional 13 0 Beta User s Guide UG 189 Chapter 7 Reserving Resources Ahead of Time UG 190 PBS Professional 13 0 Beta User s Guide Job Arrays 8 1 Advantages of Job Arrays PBS provides job arrays which are useful for collections of almost identical jobs Each job in a job array is called a subjob Subjobs are scheduled and treated just like normal jobs with the exceptions noted in this chapter You can group closely related work into a set so that you can submit query modify and display the set as a unit Job arrays are useful where you want to run the same program over and over on different input files PBS can process a job array more efficiently than it can the same number of individual normal jobs Job arrays are suited for
167. ered across different hosts The aprun statement is the follow ing aprun n 8 N 2 The old resource request using mpp is the following qsub lmppwidth 8 mppnppn 2 The translated select and place is the following qsub lselect 4 ncpus 2 mpiprocs 2 vntype cray_ compute lplace scatter Example 11 4 Specifying host The old resource request using mpp is the following qsub lmppwidth 8 mpphost examplehost The translated select and place is the following qsub lselect 8 PBScrayhost examplehost Example 11 5 Specifying labels The old resource request using mpp is the following l mppwidth 1 mpplabels small red PBS Professional 13 0 Beta User s Guide UG 259 Chapter 11 Submitting Cray Jobs The translated select and place is the following l select 1 PBScraylabel_small True PBScraylabel_red True 11 3 3 Resource Accounting Jobs that request only compute nodes are not assigned resources from login nodes PBS accounting logs do not show any login node resources being used by these jobs Jobs that request login nodes are assigned resources from login nodes and those resources appear in the PBS accounting logs for these jobs PBS performs resource accounting on the login nodes under the control of their MoMs Comprehensive System Accounting CSA runs on the compute nodes under the control of the Cray system 11 4 Rules for Submitting Jobs on the Cray 11 4 1 Always Specify Node Type If you want
168. ess than 1 GB then the amount if rounded up to 1 GB For example qstat G host1 Req d Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 43 host1 userl workq STDIN 4693 1 1 R 00 05 44 host1 userl workq STDIN 1 1 Q 45 hostl userl workq STDIN 1 1 1gb Q 10 1 3 2 Display Size in Megawords The M option to qstat displays all jobs at the requested or default server using the alternative display showing all size information in megawords MW rather than the default of smallest displayable units A word is considered to be 8 bytes For example qstat M host1 Req d Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 43 host1 userl workq STDIN 4693 1 1 R 00 05 44 hostl userl workq STDIN 1 1 Q 45 hostl userl workq STDIN 1 1 25mw Q 10 1 4 Viewing Job Status in Long Format You can use the gstat command to view all of the information about a job including values for its attributes and resources in the long format Syntax for simple form and with options qstat f qstat f p J t x job_identifier destination PBS Professional 13 0 Beta User s Guide UG 229 Chapter 10 Checking Job amp System Status The long format shows the following fields including job attributes See Job Attributes on page 384 of the PBS Professional Reference Guide for a description of attributes qstat f 13 Job Id 13 hostl Job N
169. essional 13 0 Beta User s Guide Submitting a PBS Job Chapter 2 You must supply a lt path list gt if you attempt to set Shell_Path_List otherwise you will get an error You can specify only one path for any host you name You can specify only one path that doesn t have a corresponding host name PBS chooses the path whose host name matches the name of the execution host If no match ing host is found then PBS chooses the path specified without a host if one exists 2 3 3 1 i Specifying Job s Top Shell Under UNIX Linux On UNIX Linux the job s top shell is the one MoM starts when she starts your job and the job shell is the shell or interpreter that runs your job script commands Under UNIX Linux you can use any shell such as csh or sh by specifying qsub S lt path gt You cannot use Perl or Python as your top shell Example 2 1 Using bash qsub S bin bash lt script name gt 2 3 3 1 ii Specifying Job s Top Shell Under Windows On Windows the job shell is the same as the top shell Under Windows you can specify a shell or an interpreter such as Perl or Python and if your job script is Perl or Python you must specify the language using an option to qsub you can not specify it in the job script Example 2 2 Running a Python script on Windows qsub S C Program Files PBS Pro exec bin pbs python exe lt script name gt 2 3 3 1 iii Caveats for Specifying the Job s Top Shell If you specify a relative pat
170. est and 400 megabytes MB of memory in a chunk We will cover requesting resources in Chapter 4 Allocating Resources amp Placing Jobs on page 57 The j oe line requests that PBS join the stdout and stderr output streams of the job into a single stream We will cover merging output in Merging Output and Error Files on page 52 The last three lines are the command lines for executing the programs we wish to run You can specify as many programs tasks or job steps as you need 2 3 Submitting a PBS Job 2 3 1 Prerequisites for Submitting Jobs Before you submit any jobs set your environment appropriately Follow the instructions in section 2 4 Setting Up Your Environment on page 12 2 3 2 Ways to Submit a PBS Job You can use the qsub command to submit a normal or interactive job to PBS e You can call qsub with a job script see section 2 3 3 Submitting a Job Using a Script on page 18 e You can call qsub with an executable and its arguments see section 2 3 4 Submitting Jobs by Specifying Executable on page 22 e You can call qsub and give keyboard input see section 2 3 5 Submitting Jobs Using Keyboard Input on page 23 PBS Professional 13 0 Beta User s Guide UG 17 Chapter 2 Submitting a PBS Job You can use the xpbs command to submit a normal or interactive job to PBS see section 14 6 How to Submit a Job Using xpbs on page 299 e You can run xpbs and give it a job script
171. etacharacter When using one of these shells and a PBS command taking subjobs job arrays or job array ranges as arguments the subjob job array or job array range must be enclosed in double quotes 8 6 12 2 No xpbs Command for Job Arrays xpbs does not support job arrays 8 7 Job Array Caveats 8 7 1 Job Arrays Required to be Rerunnable Job arrays are required to be rerunnable and are rerunnable by default 8 7 2 Resources Same for All Subjobs You cannot combine jobs into an array that have different hardware requirements i e differ ent select statements PBS Professional 13 0 Beta User s Guide UG 211 Chapter 8 Job Arrays 8 7 3 Checkpointing Not Supported for Job Arrays Checkpointing is not supported for job arrays On systems that support checkpointing sub jobs are not checkpointed instead they run to completion UG 212 PBS Professional 13 0 Beta User s Guide 9 Working with PBS Jobs 9 1 Current vs Historical Jobs PBS Professional can provide job history information including what the submission parame ters were whether the job started execution whether execution succeeded whether staging out of results succeeded and which resources were used PBS can keep job history for jobs which have finished execution were deleted or were moved to another server 9 1 1 Definitions Moved jobs Jobs which were moved to another server Finished jobs Jobs whose execution is done for any reason e Job
172. exampleMom ntype PBS state free UG 270 PBS Professional 13 0 Beta User s Guide Submitting Cray Jobs Chapter 11 pcpus 6 resources _available accelerator True resources available accelerator_memory examplehost_8 0 resources available accelerator_model Tesla_x2090 resources available arch XT resources available host examplehost_8 resources available mem 8192000kb resources available naccelerators examplehost_8 0 resources available ncpus 6 resources available PBScrayhost examplehost resources available PBScraynid 8 resources available PBScrayorder 1 resources available PBScrayseg 1 resources available vnode examplehost_ 8 1 resources available vntype cray_compute resources _assigned accelerator_memory examplehost_8 0 resources _assigned mem 0kb resources _assigned naccelerators examplehost 8 0 resources _assigned ncpus 0 resources _assigned netwins 0 resources _assigned vmem O0kb resv_enable True sharing force _exclhost PBS Professional 13 0 Beta User s Guide UG 271 Chapter 11 11 6 5 Submitting Cray Jobs How ALPS Request Is Constructed The reservation request that is sent to the Cray is constructed from the contents of the exec_vnode and Resource_List select job attributes If the exec_vnode attribute con tains chunks asking for the same ncpus and mem these are grouped into one section of an ALPS request Cray requires one CPU per thread The ALPS req
173. f ProPack or Performance Suite When you use the PBS supplied mpiexec PBS can track resource usage signal processes and perform accounting for all job processes The PBS mpiexec provides the standard mpiexec interface See your PBS administrator to find out whether your system is configured for the PBS mpiexec UG 136 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 5 2 16 1 Using SGI MPT with PBS You can launch an MPI job on a single Altix or across multiple Altixes For MPI jobs across multiple Altixes PBS will manage the multi host jobs For example if you have two Altixes named Alt and Alt2 and want to run two applications called mympil and mympi2 on them you can put this in your job script mpiexec host Altl n 4 mympil host Alt2 n 8 mympi2 PBS will manage and track the job s processes When the job is finished PBS will clean up after it You can run MPI jobs in the placement sets chosen by PBS 5 2 16 2 Prerequisites In order to use MPI within a PBS job with Performance Suite you may need to add the fol lowing in your job script before you call MPI module load mpt 5 2 16 3 Using Cpusets PBS will run the MPI tasks in the cpusets it manages Jobs will share cpusets if the jobs request sharing and the vnodes sharing attribute is not set to force_excl Jobs can share the memory on a nodeboard if they have a CPU from that nodeboard To fit as many small jobs as possible onto vno
174. f you submit a job at 11 15am with a time of 1110 the job will be eligible to run at 11 10am tomorrow The job s Execution _Time attribute controls deferred execution You can set it using either of the following qsub a 0700 my_job PBS a 10220700 UG 164 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs Chapter 6 6 10 Setting Your Job s Priority PBS includes a place in each job where you can specify the job s priority Your administrator may or may not choose to use this priority value when scheduling jobs Use the p lt pri ority gt to specify the priority of the job The priority argument must be an integer between 1024 lowest priority and 1023 highest priority inclusive The default is unset which is equivalent to zero The Priority job attribute contains the value you specify Set it via qsub or a directive qsub p 120 my_job PBS p 300 If you need an absolute ordering of your own jobs see section 6 2 Using Job Dependen cies on page 146 6 11 Running Your Job Interactively PBS provides a special kind of batch job called an interactive batch job or interactive job An interactive job is treated just like a regular batch job in that it is queued up and has to wait for resources to become available before it can run However once it starts your terminal input and output are connected to the job similarly to a login session It appears that you are logged into one
175. faults to 7 ncpus is set to mppdepth mppnppn If mpphost is specified as a submit argument PBS adds a custom resource called PBScrayhost to the select statement requesting the same value as for mpphost The mppnodes resource is translated by PBS into the corresponding vnodes When a job requests mpplabels PBS adds a custom resource called UG 256 PBS Professional 13 0 Beta User s Guide Submitting Cray Jobs Chapter 11 PBScraylabel_ lt label name gt to each chunk that requests a vnode from the compute node with that label For example if the job requests l mppwidth 1 mpplabels small red the translated request is 1 select 1 PBScraylabel_small True PBScraylabel_red True The following table summarizes how each mpp resource is translated into select and place statements Table 11 1 Mapping mpp Resources to select and place mpp Resource Resulting PBS How Value of PBS mpplabels for example mpplabels red small PBScraylabel_red True PBScraylabel_small True Resource Resource is Derived mpparch arch arch mpparch mppdepth mppnppn ncpus ncpus mppdepth mpp mppdepth defaults to 7 if nppn not specified mpphost PBScrayhost PBScrayhost mpphost PBS creates custom Bool ean resources named PBScraylabel_ lt label gt and sets them to True on associated vnodes Defaults to 1 if not speci fied PBS Professional 13 0 Beta User s Guide mpp
176. fferent for each occur rence resv_nodes vnode_name e Aline that specifies the total number of occurrences of the standing reservation reserve count 5 e The index of the soonest occurrence reserve index 1 e The timezone at the site of submission of the reservation is appended to the reservation s Variable_List attribute For example in California Variable List lt other variables gt PBS_TZID America Los_Angeles To get the status of a reservation at a server other than the default server set the PBS_SERVER environment variable to the name of the server you wish to query then use the pbs_rstat command Your PBS commands will treat the new server as the default server so you may wish to unset this environment variable when you are finished You can also get information about the reservation s queue by using the qstat command See qstat on page 204 of the PBS Professional Reference Guide 7 4 1 Examples of Viewing Reservation Status Using pbs_rstat In our example we have one advance reservation and one standing reservation The advance reservation is for today for two hours starting at noon The standing reservation is for every Thursday for one hour starting at 3 00 p m Today is Monday April 28th and the time is 1 00 so the advance reservation is running and the soonest occurrence of the standing reser vation is Thursday May 1 at 3 00 p m Example brief output pbs_rstat B Name R302 south Name S 304
177. fine a filename for the script Then press the Save button This will cause a PBS script file to be generated and written to the named file Pressing the Confirm Submit button at the bottom of the Submit window will submit the PBS job to the selected destination xpbs will display a small window containing the job identifier returned for this job Clicking OK on this window will cause it and the Submit window to be removed from your screen You can alternatively submit the job as an interactive batch job by clicking the Interactive button at the bottom of the Submit Job window Doing so will cause an X terminal window xterm to be launched and within that window a PBS interactive batch job submitted The path for the xterm command can be set via the preferences as discussed above in section 14 4 Setting xpbs Preferences on page 297 For further details on usage and restrictions see section 6 11 Running Your Job Interactively on page 165 14 7 Exiting xpbs Click on the Close button located in the Menu bar to leave xpbs If any settings have been changed xpbs will bring up a dialog box asking for a confirmation in regards to saving state information The settings will be saved in the xpbsrc configuration file and will be used the next time you run xpbs as discussed in the following section UG 302 PBS Professional 13 0 Beta User s Guide Using the xpbs GUI Chapter 14 14 8 The xpbs Configuration File Upon ex
178. fy R 160810 E 170910 BYDAY TU BYHOUR 11 the duration is 25 hours and the offset from the interval start is 10 minutes Your reservation will run on Tuesday at 11 10 for 25 hours ending Wednesday at 12 10 The minutes in the offset weren t overridden by anything in the recurrence rule The values specified for the arguments to the R and E options can be used to set the start and end times in a standing reservation just as they are in an advance reservation To do this don t override their elements inside the recurrence rule If you specify R 0930 E 1030 BYDAY MO TU you haven t overridden the hour or minute elements Your reservation will run Monday and Tuesday from 9 30 to 10 30 7 3 3 2 Requirements for Creating Standing Reservations e You must specify a start and end date e You must set the submission host s PBS_TZID environment variable The format for PBS_TZID isa timezone location Example America Los Angeles Amer ica Detroit Europe Berlin Asia Calcutta See section 2 4 5 Set ting the Submission Host s Time Zone on page 18 e The recurrence rule must be one unbroken line e The recurrence rule must be enclosed in double quotes e Vnodes that have been configured to accept jobs only from a specific queue vnode queue restrictions cannot be used for advance or standing reservations See your PBS administrator to determine whether some vnodes have been configured to accept jobs only from
179. g on host1 two ranks of prog2 on host2 and two ranks of prog2 on host3 cat PBS NODEFILE host1 host1 host2 host2 host3 host3 cat job script echo progl gt tmp poe cmd echo progl gt gt tmp poe cmd echo prog2 gt gt tmp poe cmd echo prog2 gt gt tmp poe cmd echo prog2 gt gt tmp poe cmd echo prog2 gt gt tmp poe cmd poe cmdfile tmp poe cmd euilib us rm tmp poe cmd qsub 1 select 3 ncpus 2 mpiprocs 2 l place scatter job script 5 2 6 Intel MPI with PBS PBS provides an interface to Intel MPI s mpirun If executed inside a PBS job this allows for PBS to track all Intel MPI processes so that PBS can perform accounting and have com plete job control If executed outside of a PBS job it behaves exactly as if standard Intel MPI s mpirun was used 5 2 6 1 Using Intel MPI Integrated with PBS You use the same mpirun command as you would use outside of PBS UG 112 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 When submitting PBS jobs that invoke the PBS supplied interface to mpirun for Intel MPI be sure to explicitly specify the actual number of ranks or MPI tasks in the qsub select specification Otherwise jobs will fail to run with too few entries in the machinefile For an example of this problem specification of the following PBS 1 select 1 ncpus 1 host hostAt1 ncpus 2 host hostB mpirun np 3 tmp mytask results in the following node file hostA
180. g job the exit status is returned before staging finishes See section 6 8 2 Caveats for Blocking Jobs on page 163 PBS Professional 13 0 Beta User s Guide UG 145 Chapter 6 Controlling How Your Job Runs 6 2 Using Job Dependencies PBS allows you to specify dependencies between two or more jobs Dependencies are useful for a variety of tasks such as e Specifying the order in which jobs in a set should execute e Requesting a job run only if an error occurs in another job e Holding jobs until a particular job starts or completes execution There is no limit on the number of dependencies per job 6 2 1 Syntax for Job Dependencies Use the W depend dependency_ list option to qsub to define dependencies between jobs The dependency list has the format type arg_list type arg list where except for the on type the arg_list is one or more PBS job IDs in the form Jobid jobid These are the available dependency types after arg_list This job may start only after all jobs in arg_list have started execution afterok arg_list This job may start only after all jobs in arg _list have terminated with no errors afternotok arg_list This job may start only after all jobs in arg_list have terminated with errors afterany arg_list This job may start after all jobs in arg list have finished execution with or without errors before arg_list Jobs in arg list may start only after specified jobs have begun executio
181. g record and purges any job information from the server s database 3 2 8 Detailed Description of Job s Lifecycle 3 2 8 1 Creation of TMPDIR For each host allocated to the job PBS creates a job specific temporary scratch directory for the job If the temporary scratch directory cannot be created the job is aborted UG 44 PBS Professional 13 0 Beta User s Guide Job Input amp Output Files Chapter 3 3 2 8 2 Choice of Staging and Execution Directories If the job s sandbox attribute is set to PRIVATE PBS creates job specific staging and execu tion directories for the job Ifthe job s sandbox attribute is set to HOME or is unset PBS uses your home directory for staging and execution 3 2 8 2 i Job specific Staging and Execution Directories If the staging and execution directory cannot be created the job is aborted If PBS fails to cre ate a staging and execution directory see the system administrator You should not depend on any particular naming scheme for the new directories that PBS cre ates for staging and execution 3 2 8 2 ii User s Home Directory as Staging and Execution Directory You must have a home directory on each execution host The absence of your home directory is an error and causes the job to be aborted 3 2 8 3 Setting Environment Variables and Attributes PBS sets PBS_JOBDIR and the job s jobdir attribute to the pathname of the staging and exe cution directory The TMPDIR environment variabl
182. h for the top shell the full path must be available in your PATH environment variable on the execution host s We recommend specifying the full path 2 3 3 2 Specifying Job Script Shell or Interpreter 2 3 3 2 i Specifying Job Script Shell or Interpreter Under UNIX Linux If you don t specify a shell for the job script it defaults to bin sh You can use any shell and you can use an interpreter such as Perl or Python You specify the shell or interpreter in the first line of your job script The top shell spawns the specified process and this process runs the job script For example to use bin sh to run the script use the following as the first line in your job script bin sh PBS Professional 13 0 Beta User s Guide UG 19 Chapter 2 Submitting a PBS Job To use Perl or Python to run your script use the path to Perl or Python as the first line in your script usr bin perl or usr bin python 2 3 3 2 ii Specifying Job Script Shell or Interpreter Under Windows Under Windows the job shell or interpreter is the same as the top shell or interpreter You can specify the top job shell or interpreter but not a separate job shell or interpreter To use a non default shell or interpreter you must specify it using an option to qsub qsub S lt path to shell or interpreter gt lt script name gt 2 3 3 3 Examples of Submitting Jobs Using Scripts Example 2 3 Our job script is named myjob We can submit it by t
183. h space resource When requesting scratch space include the resource in your chunk request l select lt scratch resource name gt lt amount of scratch needed gt lt rest of chunk specification gt Example 4 6 Your administrator has named the scratch resource dynscratch To request 10MB of scratch space in one chunk 1 select 1 ncpus N dynscratch 10MB 4 3 7 Requesting GPUs Your PBS job can request GPUs Your administrator can configure PBS to support any of the following e Job uses non specific GPUs and exclusive use of a node e Job uses non specific GPUs and shared use of a node e Job uses specific GPUs and either shared or exclusive use of a node 4 3 7 1 Binding to GPUs PBS Professional allocates GPUs but does not bind jobs to any particular GPU the applica tion itself or the CUDA library is responsible for the actual binding 4 3 7 2 Requesting Non specific GPUs and Exclusive Use of Node If your job needs GPUs but does not require specific GPUs and can request exclusive use of GPU nodes you can request GPUs the same way you request CPUs Your administrator can set up a resource to represent the GPUs on a node We recommend that the GPU resource is called ngpus PBS Professional 13 0 Beta User s Guide UG 65 Chapter 4 Allocating Resources amp Placing Jobs When requesting GPUs in this manner your job should request exclusive use of the node to prevent other jobs being scheduled on its GPUs
184. hat meet all the selected criteria will be displayed Finally to the right of the Jobs panel are the following command buttons for operating on selected job s detail provides information about selected job s This functionality can also be achieved by double clicking on a Jobs listbox entry modify for modifying attributes of the selected job s PBS Professional 13 0 Beta User s Guide UG 295 Chapter 14 delete Using the xpbs GUI for deleting the selected job s hold for placing some type of hold on selected job s release for releasing held job s signal for sending signals to selected job s that are running msg for writing a message into the output streams of selected job s move for moving selected job s into some specified destination order for exchanging order of two selected jobs in a queue run for running selected job s admin only rerun for requeueing selected job s that are running admin only The middle portion of the Jobs Panel has abbreviated column names indicating the informa tion being displayed as the following table shows Table 14 3 xpbs Job Column Headings Heading Meaning Job id Job Identifier Name Name assigned to job or script name User User name under which job is running PEs Number of Processing Elements CPUs requested CputUse Amount of CPU time used WalltUse Amount of wall clock time used S State of j
185. he index start If the index end is not a multiple of the stepping factor above the index start it will not be used as an index value and the high est index value used will be lower than the index end For example if index start is 1 index end is 8 and the stepping factor is 3 the index values are 1 4 and 7 8 4 2 Examples of Submitting Job Arrays Example 8 1 To submit a job array of 10 000 subjobs with indices 1 2 3 10000 qsub J 1 10000 job scr 1234 server domain com Example 8 2 To submit a job array of 500 subjobs with indices 500 501 502 1000 qsub J 500 1000 job scr 1235 server domain com Example 8 3 To submit a job array with indices 1 3 5 999 qsub J 1 1000 2 job scr 1236 server domain com UG 196 PBS Professional 13 0 Beta User s Guide Job Arrays Chapter 8 8 4 3 File Staging for Job Arrays When preparing files to be staged for a job array plan on naming the files so that they match the index numbers of the subjobs For example inputfile3 is meant to be used by the subjob with index value 3 To stage files for job arrays you use the same mechanism as for normal jobs but include a variable to specify the subjob index This variable is named array_index 8 4 3 1 File Staging Syntax for Job Arrays You can specify files to be staged in before the job runs and staged out after the job runs For mat qsub W stagein lt stagein file list gt W stageou
186. he job JobA qsub N JobA myprog a b lt return gt To use environment variables you define earlier export INFILE tmp myinfile export INDATA tmp mydata qsub a out SINFILE SINDATA UG 22 PBS Professional 13 0 Beta User s Guide Submitting a PBS Job Chapter 2 2 3 5 Submitting Jobs Using Keyboard Input You can specify that qsub read input from the keyboard If you run the qsub command with the resource requests on the command line and then press enter without naming a job file PBS will read input from the keyboard This is often referred to as a here document You can direct qsub to stop reading input and submit the job by typing on a line by itself a con trol d UNIX or control z then enter Windows Note that under UNIX if you enter a control c while qsub is reading input qsub will terminate the process and the job will not be submitted Under Windows however often the control c sequence will depending on the command prompt used cause qsub to submit the job to PBS In such case a control break sequence will usually terminate the qsub command qsub lt ret gt directives tasks ctrl D 2 3 6 Submitting Jobs Under Windows 2 3 6 1 Passwords When running PBS in a password protected Windows environment you will need to specify to PBS the password needed in order to run your jobs There are two methods of doing this 1 by providing PBS with a password once to be used for all jobs
187. his This first line can be omitted if it is acceptable for the job file to be interpreted using the login shell We recommend that you always specify the shell 2 2 2 2 Python Job Scripts PBS allows you to submit jobs using Python scripts under Windows or UNIX Linux PBS includes a Python package allowing Python job scripts to run you do not need to install Python To run a Python job script UNIX Linux qsub lt script name gt Windows qsub S PBS_EXEC bin pbs python exe lt script name gt If the path contains any spaces it must be quoted for example qsub S PBS_EXEC bin pbs python exe lt python job script gt PBS Professional 13 0 Beta User s Guide UG 11 Chapter 2 Submitting a PBS Job You can include PBS directives in a Python job script as you would in a UNIX shell script For example cat myjob py usr bin python PBS 1 select 1 ncpus 3 mem 1gb PBS N HelloJob print Hello Python job scripts can access Win32 APIs including the following modules e Win32api e Win32con e Pywintypes 2 2 2 2 i Debugging Python Job Scripts You can run Python interactively outside of PBS to debug a Python job script You use the Python interpreter to test parts of your script Under UNIX Linux use the i option to the pbs_ python command for example opt pbs default bin pbs python i lt return gt Under Windows the i option is not necessary but can be used For example either of the follow
188. his may or may not be enough information for your purposes Many users will use shell syntax to pass the list of job identifiers directly into qstat for viewing purposes as shown in the next example necessarily different between UNIX and Windows UNIX qstat a qselect u barry l ncpus gt 16 Req d Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 121 south barry workq airfoil 32 0 01H 133 south barry workq trialx 20 0 01W 154 south barry workgq airfoil 930 32 1 30 R 0 32 PBS Professional 13 0 Beta User s Guide UG 245 Chapter 10 Checking Job amp System Status Windows type the following at the cmd prompt all on one line for F usebackq j in qselect u barry l ncpus gt 16 gt do qstat a bj 121 south 133 south 154 south Note This technique of using the output of the qselect command as input to qstat can also be used to supply input to other PBS commands as well 10 5 1 Listing Job Identifiers of Finished and Moved Jobs You can list identifiers of finished and moved jobs in the same way as for queued and running jobs as long as the job history is still being preserved The x option to the qselect command allows you to list job identifiers for all jobs whether they are running queued finished or moved The H option to the gselect command allows you to list job identifiers for finished or moved jobs only 10 5 2 Listing Jobs by Time Attributes Y
189. ial state at submission time or to its altered state if it has been qaltered All of that job array s subjobs are requeued which includes those that are currently running and those that are completed and deleted If a subjob or range is given those subjobs are requeued as jobs would be 8 6 9 Signaling a Job Array Ifa job array object subjob or job array range is given to qsig all currently running subjobs within the specified set are sent the signal 8 6 10 Sending Messages to Job Arrays The qmsg command is not supported for job arrays UG 210 PBS Professional 13 0 Beta User s Guide Job Arrays Chapter 8 8 6 11 Getting Log Data on Job Arrays The tracejob command can be run on job arrays and individual subjobs When trace job is run on a job array or a subjob the same information is displayed as for a job with additional information for a job array Note that subjobs do not exist until they are running so tracejob will not show any information until they are When tracejob is run on a job array the information displayed is only that for the job array object not the subjobs Job arrays themselves do not produce any MoM log information Running tracejob ona job array gives information about why a subjob did not start 8 6 12 Caveats for Using PBS Commands with Job Arrays 8 6 12 1 Shells and PBS Commands with Job Arrays 99 66 Some shells such as csh and tesh use the square bracket as a m
190. ide Multiprocessor Jobs Chapter 5 Run job script qsub 1 select 3 ncpus 2 mpiprocs 2 job script lt job id gt 5 2 12 3 Restrictions The maximum number of ranks that can be launched under integrated MVAPICH is the num ber of entries in PBS_NODEFILE 5 2 13 MVAPICH2 with PBS PBS provides an mpiexec interface to MVAPICH2 s mpiexec When you use the PBS supplied mpiexec PBS can track all MVAPICH2 processes perform accounting and have complete job control Your PBS administrator can integrate MVAPICH2 with PBS so that you can use the PBS supplied mpirun in place of the MVAPICH2 mpirun in your job scripts MVAPICH2 allows your jobs to use InfiniBand 5 2 13 1 Interface to MVAPICH2 mpiexec Command If executed outside of a PBS job it behaves exactly as if standard MVAPICH2 s mpiexec had been used If executed inside a PBS job script all of the options to the PBS interface are the same as MVAPICH2 s mpiexec except for the following host The host option is ignored machinefile lt file gt The file option is ignored mpdboot If mpdboot is not called before mpiexec it is called automatically before mpiexec runs so that an MPD daemon is started on each host assigned by PBS 5 2 13 2 MPD Startup and Shutdown The interface ensures that the MPD daemons are started on each of the hosts listed in PBS_NODEFILE It also ensures that the MPD daemons are shut down at the end of MPI job execution PBS Professional 13
191. ied in the process group file MPI processes spawned on non PBS hosts are not guaranteed to be under the control of PBS 5 2 9 1 ii MPD Startup and Shutdown The script starts MPD daemons on each of the unique hosts listed in BS_NODEFILE using either the rsh or ssh method based on the value of the environment variable RSH COMMAND The default is rsh The script also takes care of shutting down the MPD dae mons at the end of a run If the MPD daemons are not running the PBS interface to mpirun will start GM s MPD dae mons as you on the allocated PBS hosts The MPD daemons may have been started already by the administrator or by you MPD daemons are not started inside a PBS prologue script since it won t have the path of mpirun that you executed GM or MX which would deter mine the path to the MPD binary UG 120 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 5 2 9 1 iii Examples Example 5 26 Run a single executable MPICH GM job with 3 processes spread out across the PBS allocated hosts listed in BS_NODEFILE PBS_NODEFILE pbs host1 pbs host2 pbs host3 qsub 1 select 3 ncpus 1 MPICH GM HOME bin mpirun np 3 path myprog x 1200 D lt job id gt If the GM MPD daemons are not running the PBS interface to mpirun will start them as you on the allocated PBS hosts The daemons may have been previously started by the administrator or by you Example 5 27 Run an MPICH GM job with multip
192. ilogue is executed with its current working directory set to the job s staging and execution directory and with PBS_JOBDIR and TMPDIR set in its environment 5 1 7 MPI Environment Variables NCPUS PBS sets the NCPUS environment variable in the job s environment on the primary execution host PBS sets NCPUS to the value of ncpus requested for the first chunk OMP_NUM_THREADS PBS sets the OMP_NUM_THREADS environment variable in the job s environ ment on the primary execution host PBS sets this variable to the value of ompthreads requested for the first chunk which defaults to the value of ncpus requested for the first chunk UG 98 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 5 1 8 Examples of Multiprocessor Jobs Example 5 12 For a 10 way MPI job with 2gb of memory per MPI task qsub 1 select 10 ncpus 1 mem 2gb Example 5 13 If you have a cluster of small systems with for example two CPUs each and you wish to submit an MPI job that will run on four separate hosts qsub 1 select 4 ncpus 1 1 place scatter In this example the node file contains one entry for each of the hosts allocated to the job which is four entries The variables NCPUS and OMP_NUM_THREADS are set to one Example 5 14 If you do not care where the four MPI processes are run qsub 1 select 4 ncpus 1 1 place free Here the job runs on two three or four hosts depending on what is available For this example the node fi
193. ing MPIs on page 104 2 1 6 Types of Jobs PBS allows you to submit standard batch jobs or interactive jobs The difference is that while the interactive job runs you have an interactive session running giving you interactive access to job processes There is no interactive access to a standard batch job We cover interactive jobs in section 6 11 Running Your Job Interactively on page 165 2 1 7 Job Input and Output Files You can tell PBS to copy files or directories from any accessible location to the execution host and to copy output files and directories from the execution host wherever you want We describe how to do this in Chapter 3 Job Input amp Output Files on page 35 UG 10 PBS Professional 13 0 Beta User s Guide Submitting a PBS Job Chapter 2 2 2 The PBS Job Script 2 2 1 Overview of a Job Script A PBS job script consists of e An optional shell specification e PBS directives e Job tasks programs or commands 2 2 2 Types of Job Scripts PBS allows you to use any of the following for job scripts e A Python Perl or other script that can run under Windows or UNIX Linux e AUNIX shell script that runs under UNIX Linux e Windows command or PowerShell batch script under Windows 2 2 2 1 UNIX Shell Scripts Since the job file can be a shell script the first line of a shell script job file specifies which shell to use to execute the script Your login shell is the default but you can change t
194. ing will work C Program Files PBS Pro exec bin pbs_python exe lt return gt C Program Files PBS Pro exec bin pbs python exe i lt return gt When the Python interpreter runs it presents you with its own prompt For example opt pbs default bin pbs python i lt return gt gt gt print hello hello 2 2 2 2 ii Python Windows Caveat If you have Python natively installed and you need to use the win32api make sure that you import pywintypes before win32api otherwise you will get an error Do the following emd gt pbs_python gt gt import pywintypes gt gt import win32api UG 12 PBS Professional 13 0 Beta User s Guide Submitting a PBS Job Chapter 2 2 2 2 3 Windows Job Scripts The Windows script can be a exe or bat file or a Python or Perl script 2 2 2 3 i Requirements for Windows Command Scripts Under Windows comments in the job script must be in ASCII characters Any bat files that are to be executed within a PBS job script have to be prefixed with call as in echo off call E stepl bat call E step2 bat Without the call only the first bat file gets executed and it doesn t return control to the calling interpreter For example an old job script that contains echo off E stepl bat E step2 bat should now be echo off call E stepl bat call E step2 bat 2 2 2 3 ii Windows Advice and Caveats In Windows if you use notepad to create a job script the last line is no
195. inished These options are the following sandbox By default PBS runs the job script in the owner s home directory If sandbox is set to PRIVATE PBS creates a job specific execution directory and runs the job script there See section 3 2 2 1 Setting the Job s Staging and Execution Directory on page 36 k Specifies whether and which of stdout and stderr is retained in the job s execu tion directory When set this option overrides o and e See section 3 3 5 Keeping Output and Error Files on Execution Host on page 52 0 Specifies destination for stdout Overridden by k when k is set See section 3 3 2 Paths for Output and Error Files on page 50 e Specifies destination for stderr Overridden by k when k is set See section 3 3 2 Paths for Output and Error Files on page 50 UG 48 PBS Professional 13 0 Beta User s Guide Job Input amp Output Files Chapter 3 The following table shows how these options control creation and copying of stdout and stderr Table 3 4 How k sandbox o and e Options to qsub Affect stdout and stderr Where stdout sandbo Where stdout k o e stderr are X stderr are copied created HOME or unset unset PBS HOME spool PBS_O_WORKDIR unset which is job submission directory HOME or unset lt path PBS HOME spool Destination specified in o unset gt lt path gt and or e lt path gt HOME or lt pa
196. inux Job umask On UNIX Linux whenever your job copies or creates a file or directory on the execution host MoM uses umask to determine the permissions for the file or directory If you do not specify a value for umask MoM uses the system default You can specify a value using the follow ing methods e Useqsub W umask lt value gt e Use PBS umask lt value gt This applies when staging or copying files or directories to the execution host or writing stdout or stderr on the execution host PBS Professional 13 0 Beta User s Guide UG 53 Chapter 3 Job Input amp Output Files In the following example we set umask to 022 to have files created with write permission for owner only The desired permissions are rw r r qsub W umask 022 my_job PBS W umask 022 3 3 6 1 Caveats This feature does not apply to Windows 3 3 7 Troubleshooting File Delivery File delivery is handled by MoM on the execution host For a description of how file delivery works see Setting File Transfer Mechanism on page 1035 in the PBS Professional Admin istrator s Guide For troubleshooting file delivery see Troubleshooting File Transfer on page 1041 in the PBS Professional Administrator s Guide 3 3 7 1 Non delivery of Output If the output of a job cannot be delivered to you it is saved in a special directory named PBS HOME undelivered and mail is sent to you The typical causes of non delivery are The destination host is not
197. ion or route Server Name of server on which queue exists 14 3 4 xpbs Jobs Panel The Jobs panel is composed of a leading horizontal JOBS bar a listbox and a set of command buttons The JOBS bar lists the queues that are consulted when listing jobs the bar also con tains a minimize maximize button for displaying or iconizing the Jobs region The listbox dis plays information about jobs that are found in the queue s selected from the Queues listbox each listbox entry can be selected as described above for the Hosts panel The region just above the Jobs listbox shows a collection of command buttons whose labels describe criteria used for filtering the Jobs listbox contents The list of jobs can be selected according to the owner of jobs Owners job state Job_ State name of the job Job_Name type of hold placed on the job Hold_ Types the account name associated with the job Account_Name checkpoint attribute Checkpoint time the job is eligible for queueing execution qtime resources requested by the job Resource_List priority attached to the job Priority and whether or not the job is rerunnable Rerunable The selection criteria can be modified by clicking on any of the appropriate command buttons to bring up a selection box The criteria command buttons are accompanied by a Select Jobs button which when clicked will update the contents of the Jobs listbox based on the new selection criteria Note that only jobs t
198. is directory is either your home directory or a job specific directory created by PBS just for this job If you use job specific staging and execution directories you don t need to have a home directory on each execution host as long as those hosts are configured properly In addition each job gets its own staging and execution directory so you can more easily avoid filename collisions PBS Professional 13 0 Beta User s Guide UG 35 Chapter 3 Job Input amp Output Files This table lists the differences between using your home directory for staging and execution and using a job specific staging and execution directory created by PBS Table 3 1 Differences Between User s Home and Job specific Directory for Staging and Execution Question Regarding Action User s Home Job specific Requirement or Setting Directory Directory Does PBS create a job specific staging and No Yes execution directory User s home directory must exist on execu Yes No tion host s Standard out and standard error automati No Yes cally deleted when qsub kK option is used When are staged out files are deleted Successfully staged Only after all are out files are deleted successfully others go to undeliv staged out ered Staging and execution directory deleted No Yes after job finishes What is job s sandbox attribute set to HOME or not set PRIVATE 3 2 2 Using Job specific Staging and Exec
199. isted in 6 BS_NODEFILE Only three hosts are available PBS_NODEFILE pbs host1 pbs host2 pbs host3 pbs host1 pbs host2 pbs host3 Job script mpirun runs 6 processes scattered over 3 hosts listed in PBS NODEFILE mpirun np 6 path myprog x 1200 Run job script qsub 1 select 6 ncpus 1 lplace scatter job script lt job id gt Example 5 35 Run an MPICH2 job with multiple executables on multiple hosts using PBS_NODEFILE and mpiexec arguments in mpirun PBS_NODEFILE hostA hostA hostB hostB hostC hostC Job script PBS l select 3 ncpus 2 mpiprocs 2 mpirun np 2 tmp mpitestl np 2 tmp mpitest2 np 2 tmp mpitest3 UG 130 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 Run job qsub job script Example 5 36 Run an MPICH2 job with multiple executables on multiple hosts using mpirun configfile option and PBS_NODEFILE PBS_NODEFILE hostA hostA hostB hostB hostC hostC Job script PBS 1 select 3 ncpus 2 mpiprocs 2 echo np 2 tmp mpitestl gt my config file echo np 2 tmp mpitest2 gt gt my config file echo np 2 tmp mpitest3 gt gt my config file mpirun configfile my config file mm f my config file Run job qsub job script 5 2 11 4 Restrictions The maximum number of ranks that can be launched under integrated MPICH2 is the number of entries in 6 BS_NODEFILE 5 2 12 MVAPICH with PBS PBS provides an mpirun interface to the MVAPICH mpirun
200. it the xpbs state may be written to the xpbsrc file in your home directory Infor mation saved includes the selected host s queue s and job s the different jobs listing cri teria the view states i e minimized maximized of the Hosts Queues Jobs and INFO regions and all settings in the Preferences section In addition there is a system wide xpbs configuration file maintained by the PBS Administrator which is used in the absence of your personal xpbsrc file 14 9 xpbs Preferences The resources that can be set in the xpbs configuration file xpbsrc are serverHosts List of server hosts space separated to query by xpbs A special keyword PBS_DEFAULT_SERVER can be used which will be used as a placeholder for the value obtained from the etc pbs conf file UNIX or PBS Destination Folder pbs conf file Windows timeoutSecs Specify the number of seconds before timing out waiting for a connection to a PBS host xtermCmd The xterm command to run driving an interactive PBS session labelFont Font applied to text appearing in labels fixlabelFont Font applied to text that label fixed width widgets such as listbox labels This must be a fixed width font textFont Font applied to a text widget Keep this as fixed width font backgroundColor The color applied to background of frames buttons entries scrollbar handles foregroundColor The color applied to text in any context PBS Professional 13 0 Beta
201. it is 72 The nchunk resource cannot be named in a select statement it can only be specified as a number preceding the colon as in the above example When the number is omitted nchunk is 1 Non consumable Settable by Manager and Operator readable by all Format nteger Python type int Default value 7 11 3 1 2 PBS Resources for the Cray vntype Built in This resource represents the type of the vnode Automatically set by PBS to one of two specific values for cray vnodes Has no meaning for non Cray vnodes Non consumable Format String array Automatically assigned values for Cray vnodes cray_compute This vnode represents part of a compute node cray_login PBS Professional 13 0 Beta User s Guide UG 253 Chapter 11 Submitting Cray Jobs This vnode represents a login node Default value None Python type str PBScrayhost On CLE 2 2 this is set to default Custom resource created by PBS for the Cray On CLE 3 0 and higher used to delin eate a Cray system containing ALPS login nodes running PBS MoMs and compute nodes from a separate Cray system with a separate ALPS Non consumable The value of PBScrayhost is set to the value of mpp_host for this system Format String Default CLE 2 2 default CLE 3 0 and higher None PBScraylabel_ lt label name gt Custom resource created by PBS for the Cray Tracks labels applied to compute nodes For each label on a compute node PBS creates a custom resource
202. ives Default qsub place statement Queue default placement rules Server default placement rules SOV Gr oe OM Built in default conversion and placement rules PBS Professional 13 0 Beta User s Guide UG 81 Chapter 4 Allocating Resources amp Placing Jobs 4 7 3 Caveats and Restrictions for Specifying Placement The place specification cannot be used without the select specification In other words you can only specify placement when you have specified chunks A select specification cannot be used with a nodes specification A select specification cannot be used with old style resource requests such as Incpus Imem lvmem larch lhost When using place group lt resource gt the resource must be a string or string array Do not mix old and new syntax when requesting placement See section 4 8 Backward Compatibility on page 86 for a description of old syntax 4 7 4 Examples of Specifying Placement Unless otherwise specified the vnodes allocated to the job will be allocated as shared or exclusive based on the setting of the vnode s sharing attribute Each of the following shows how you would use l select and 1 place 1 A job that will fit in a single host such as an Altix but not in any of the vnodes packed into the fewest vnodes l select 1 ncpus 10 mem 20gb l place pack In earlier versions this would have been 1ncpus 10 mem 20gb Request four chunks each with 1 CPU and 4GB of mem
203. ks like an old style resource request PBS does not convert it to a chunk request because Red is defined at the server 4 8 4 4 Properties are Deprecated The syntax for requesting properties is deprecated Your administrator has replaced proper ties with Booleans 4 8 4 5 Replace cpp with ncpus Specifying cpp is part of the old syntax and should be replaced with nepus 4 8 4 6 Environment Variables Set During Conversion 1 When a node specification is converted into a select statement the job has the environment variables NCPUS and OMP_NUM_THREADS set to the old value of ncpus in the first piece of the old node specification This may pro duce incompatibilities with prior versions when a complex node specification using different values of ncpus and ppn in different pieces is converted UG 92 PBS Professional 13 0 Beta User s Guide 5 Multiprocessor Jobs 5 1 Submitting Multiprocessor Jobs Before you read this chapter please read Chapter 4 Allocating Resources amp Placing Jobs on page 57 5 1 1 Assigning the Chunks You Want PBS assigns chunks to job processes in the order in which the chunks appear in the select statement PBS takes the first chunk from the primary execution host this is where the top task of the job runs Example 5 1 You want three chunks where the first has two CPUs and 20 GB of memory the second has four CPUs and 100 GB of memory and the third has one CPU and five GB of mem
204. l Reference Guide for format of lt states_string gt UG 304 PBS Professional 13 0 Beta User s Guide Using the xpbs GUI Chapter 14 selectRes List of resource amounts space separated to consult when limiting the jobs appear ing on the Jobs listbox in the main xpbs window Specify value as Resources lt res_string gt See 1 option in qselect on page 192 of the PBS Professional Reference Guide for for mat of lt res_string gt selectExecTime The Execution Time attribute to consult when limiting the list of jobs appearing on the Jobs listbox in the main xpbs window Specify value as Queue_Time lt exec_time gt See a option in qselect on page 192 of the PBS Professional Ref erence Guide for format of lt exec_time gt selectAcctName The name of the account that will be checked when limiting the jobs appearing on the Jobs listbox in the main xpbs window Specify value as Account_Name lt account_name gt See A option in qselect on page 192 of the PBS Professional Reference Guide for format of lt account_name gt selectCheckpoint The Checkpoint attribute relationship including the logical operator to consult when limiting the list of jobs appearing on the Jobs listbox in the main xpbs win dow Specify value as Checkpoint lt checkpoint_arg gt See c option in qselect on page 192 of the PBS Professional Reference Guide for format of lt checkpoint_arg gt sele
205. l briefly cover the basics of PBS resources For a thorough discussion see section Resources on page 305 of the PBS Pro fessional Administrator s Guide especially sections 5 4 and 5 5 For a complete description of each PBS resource see Resources on page 305 of the PBS Professional Reference Guide PBS resources represent things such as CPUs memory application licenses switches scratch space and time They can also represent whether or not something is true for example whether a machine is dedicated to a particular project PBS provides a set of built in resources and allows the administrator to define additional cus tom resources Custom resources are used for application licenses scratch space etc and are defined by the administrator Custom resources are used the same way built in resources are used PBS supplies the following types of resources Boolean duration float long size string string_array See List of Formats on page 413 of the PBS Professional Reference Guide for a description of each resource type See Built in Resources on page 307 of the PBS Professional Reference Guide for a listing of built in resources 3 For some systems PBS creates specific custom resources see Custom Cray Resources on page 315 of the PBS Professional Reference Guide The administrator can specify which resources are available at the server each queue and each vnode Resources defined
206. l to fast lselect 16 ncpus 2 mpiprocs 2 speed fast Example 5 10 Request 16 chunks where each chunk has two CPUs using grouping to ensure that all chunks share the same speed The resource that identifies the speed is named speed lselect 16 ncpus 2 mpiprocs 2 place group speed PBS Professional 13 0 Beta User s Guide UG 97 Chapter 5 Multiprocessor Jobs 5 1 4 2 Requesting Storage on NFS Server One of the vnodes in your complex may act as an NFS server to the rest of the vnodes so that all vnodes have access to the storage on the NFS server Example 5 11 The scratch resource is shared among all the vnodes in the complex and is requested from a central location called the nfs_server vnode To request two vnodes each with two CPUs to do calculations and one vnode with 10gb of memory and no MPI processes 1 select 2 ncpus 2 1 host nfs_server scratch 10gb ncpus 0 With this request your job has one MPI process on each chunk containing CPUs and no MPI processes on the memory only chunk The job shows up as having a chunk on the nfs_server host 5 1 5 File Staging for Multiprocessor Jobs PBS stages files to and from the primary execution host only 5 1 6 Prologue and Epilogue The prologue is run as root on the primary host with the current working directory set to PBS_HOME mom_priv and with PBS_JOBDIR and TMPDIR set in its environment PBS runs the epilogue as root on the primary host The ep
207. le contains four entries These are either four separate hosts or three hosts one of which is repeated once or two hosts etc NCPUS and OMP_NUM_THREADS are set to 7 the number of CPUs allocated from the first chunk 5 1 9 Submitting SMP Jobs To submit an SMP job simply request a single chunk containing all of the required CPUs and memory and if necessary specify the hostname For example qsub 1 select ncpus 8 mem 20gb host host1 When the job is run the node file will contain one entry the name of the selected execution host The job will have two environment variables NCPUS and OMP_NUM_THREADS set to the number of CPUs allocated PBS Professional 13 0 Beta User s Guide UG 99 Chapter 5 Multiprocessor Jobs 5 2 Using MPI with PBS 5 2 1 Using an Integrated MPI Many MPIs are integrated with PBS PBS provides tools to integrate most of them a few MPIs supply the integration When a job is run under an integrated MPI PBS can track resource usage signal job processes and perform accounting for all processes of the job When a job is run under an MPI that is not integrated with PBS PBS is limited to managing the job only on the primary vnode so resource tracking job signaling and accounting happen only for the processes on the primary vnode The instructions that follow are for integrated MPIs Check with your administrator to find out which MPIs are integrated at your site If an MPI is not integrated with PBS
208. le executables on multiple hosts listed in the process group file procgrp Job script qsub 1 select 2 ncpus 1 echo hostl 1 userl x y a exe argl arg2 gt procgrp echo host2 1 userl x x b exe argl arg2 gt gt procgrp MPICH GM HOME bin mpirun pg procgrp path mypro x 1200 rm f procgrp D lt job id gt When the job runs mpirun gives the warning message warning pg is allowed but it is up to user to make sure only PBS hosts are specified MPI processes spawned are not guaranteed to be under PBS control The warning is issued because if any of the hosts listed in procgrp are not under the control of PBS then the processes on those hosts will not be under the control of PBS PBS Professional 13 0 Beta User s Guide UG 121 Chapter 5 Multiprocessor Jobs 5 2 9 2 Using MPICH GM and rsh ssh with PBS PBS provides an interface to MPICH GM s mpirun using rsh ssh If executed inside a PBS job this lets PBS track all MPICH GM processes started via rsh ssh so that PBS can perform accounting and have complete job control If executed outside of a PBS job it behaves exactly as if standard mpirun had been used You use the same mpirun command as you would use outside of PBS 5 2 9 2 i Options Inside a PBS job script all of the options to the PBS interface are the same as mpirun except for the following machinefile lt file gt The file argument contents are ignored and replaced by the contents of PBS_NOD
209. llowing option to qdel to specify a limit on emails sent qdel Wsuppress_email lt N gt See section 2 5 1 3 Restricting Number of Job Deletion Emails on page 29 9 3 5 Deleting a Job Using xpbs To delete a job using xpbs first select the job s of interest then click the delete button 9 4 Sending Messages to Jobs To send a message to a job is to write a message string into one or more output files of the job Typically this is done to leave an informative message in the output of the job Such messages can be written using the gmsg command You can send messages to running jobs only The usage syntax of the qmsg command is qmsg E O message_string job_identifier Example qmsg O output file message 54 qmsg O output file message 1234 server Job array identifiers must be enclosed in double quotes The E option writes the message into the error file of the specified job s The O option writes the message into the output file of the specified job s If neither option is specified the message will be written to the error file of the job PBS Professional 13 0 Beta User s Guide UG 219 Chapter 9 Working with PBS Jobs The first operand message_st ring is the message to be written If the string contains blanks the string must be quoted If the final character of the string is not a newline a newline character will be added when written to the job s file All remaining operands a
210. lly check out the licenses the applica tion being run inside the job s session does that 4 3 5 2 Requesting Node locked Application Licenses Node locked application licenses are available at the vnode s that are licensed for the appli cation These are host level chunk resources that are requested inside of a select statement PBS Professional 13 0 Beta User s Guide UG 63 Chapter 4 Allocating Resources amp Placing Jobs 4 3 5 2 i Requesting Per host Node locked Application Licenses Per host node locked application licenses are typically configured as a Boolean resource that indicates whether or not the required license is available at that host When requesting Boolean valued per host node locked licenses request one per host For mat qsub l select lt Boolean resource name gt true lt rest of chunk specification gt Example 4 3 The Boolean resource named runsAppA specifies whether this vnode has the necessary license To request a host with a per host node locked license for AppA in one chunk qsub 1 select 1 runsAppA 1 lt job script gt 4 3 5 2 ii Requesting Per use Node locked Application Licenses Per use node locked application licenses are typically configured as a consumable numeric resource so that the host s that run the application have the number of licenses that can be used at one time When requesting numerical per use node locked licenses request the required number of licenses for each host
211. me Thu Apr 2 12 07 05 2010 Submit _arguments lselect host host3 ping n 100 127 0 0 1 executable lt jsdl hpcpa Executable gt ping lt jsdl hpcpa Executable gt argument list lt jsdl hpcpa Argument gt n lt jsdl hpcpa Argument gt lt jsdl hpcpa Ar gument gt 100 lt 4jsdl hpcpa Argument gt lt jsdl hpcpa Argument gt 127 0 0 1 lt jsdl hp cpa Argument gt 10 1 4 1 Path Display under Windows When you view a job in long format that was submitted from a mapped drive PBS displays the UNC path for the job s Output_Path Error_Path attributes and the value for PBS_O_WORKDIR in the job s Variable_List attribute When you view a job in long format that was submitted using UNC paths for output and error files PBS displays the UNC path for the job s Output_Path and Error_Path attributes 10 1 5 Listing Jobs by User The u option to qstat displays jobs owned by any of a list of user names you specify Syntax qstat u user_name host user_name host PBS Professional 13 0 Beta User s Guide UG 231 Chapter 10 Checking Job amp System Status Host names are not required and may be wild carded on the left end e g mydo main com user_name without a hos t is equivalent to user_name that is at any host qstat u userl Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 16 south userl workq aimsl4 1 0 01H 18 south userl workq aimsl4 1 0 01W
212. me job session e Ifyou experience a problem with X when using qsub X I use the following to cre ate the correct Xauthority file for qsub to use when establishing the X session ssh X lt hostname gt server lt gt exec host s 6 11 9 5 X Forwarding Errors e Ifthe DISPLAY environment variable is pointing to a display number that is correctly formatted but incorrect submitting an interactive X forwarding job results in the follow ing error message cannot read data from xauth list lt display number gt errno lt errno gt e Ifthe DISPLAY environment variable is pointing to an incorrectly formatted value sub mitting an interactive X forwarding job results in the following error message qsub Failed to get xauth data check SDISPLAY variable e Ifthe X authority utility xauth is not found on the submission host the following error message is displayed execution of xauth failed sh xauth command not found e When the execution of the xauth utility results in an error the error message displayed by the xauth utility is preceded by the following execution of xauth failed e When the qsub X option is used without I the following error message is displayed qsub X11 forwarding possible only for Interactive Jobs 6 11 10 Using Environment Variables PBS provides your job with environment variables where the job runs PBS takes some from your submission environment and creates others Yo
213. mem mem mem mppmem mppnodes Corresponding vnodes PBS uses vnodes represent ing requested nodes mppnppn mpiprocs mpiprocs mppnppn UG 257 Chapter 11 Submitting Cray Jobs Table 11 1 Mapping mpp Resources to select and place mpp Resource Resulting PBS Resource How Value of PBS Resource is Derived mppwidth mppnppn specified nchunk nchunk mppwidth mpp nppn place scatter mpiprocs mppnppn Example if mppwidth 8 and mppnppn 2 nchunk 4 mppnppn not specified nchunk nchunk mppwidth place free mpiprocs not set Example if mppwidth 8 nchunk 8 UG 258 PBS Professional 13 0 Beta User s Guide Submitting Cray Jobs Chapter 11 11 3 2 1 Examples of Mapping mpp Resources to select and place Example 11 1 You want 8 PEs The aprun statement is the following aprun n 8 The old resource request using mpp is the following qsub l mppwidth 8 The translated select and place is the following qsub lselect 8 vntype cray_compute Example 11 2 You want 8 PEs with only one PE per compute node The aprun statement is the following aprun n 8 N 1 The old resource request using mpp is the following qsub lmppwidth 8 mppnppn 1 The translated select and place is the following qsub lselect 8 ncpus 1 mpiprocs 1 vntype cray_compute lplace scatter Example 11 3 You want 8 PEs with 2 PEs per compute node This equates to 4 chunks of 2 ncpus per chunk scatt
214. mple to request six of the previous chunk l select 6 ncpus 2 mem 4gb If you don t specify N the number of chunks it is taken to be 7 To request different chunks concatenate the chunks using the plus sign l select number of chunks lt chunk specification gt number of chunks lt chunk specification gt For example to request two sets of chunks where one set of 6 chunks has 2 CPUs per chunk and one set of 3 chunks has 8 CPUs per chunk and both sets have 4GB of memory per chunk l select 6 ncpus 2 mem 4gb 3 ncpus 8 mem 4GB No spaces are allowed between chunks You must specify all your chunks in a single select statement You can request chunk resources using any of the following e Theqsub 1 select N chunk specification N chunk spec ification option e A PBS 1 select N chunk specification N chunk specifi cation directive 4 3 4 Requesting Boolean Resources A resource request can specify whether a Boolean resource should be True or False Example 4 1 Some vnodes have green True and some have red True and you want to request two vnodes each with one CPU all green and no red l select 2 green true red false ncpus 1 Example 4 2 This job script snippet has a job wide request for walltime and a chunk request for CPUs and memory where the Boolean resource HasMyApp is True PBS 1 walltime 1 00 00 PBS 1 select ncpus 4 mem 400mb HasMyApp true UG 62 PBS Professional 13 0 Beta User
215. ms e The job array object itself The format is sequence_number or sequence_number server domain com Example 1234 server or 1234 e A single subjob with index M The format is segquence_number M or sequence_number M server domain com Example 1234 M server or 1234 M e Arange of subjobs of a job array The format is sequence_number start end step or sequence_number start end step server domain com Example 1234 X Y Z server or 1234 X Y Z 8 3 4 1 Examples of Using Identifier Syntax 1234 server domain com Full job array identifier 1234 Short job array identifier 1234 73 Subjob identifier of the 73rd index of job array 1234 1234 Error if 1234 is a job array 1234 server domain com Error if 1234 server domain com is a job array 8 3 4 2 Shells and Array Identifiers Since some shells for example csh and tesh read and as shell metacharacters job array names and subjob names must be enclosed in double quotes for all PBS commands Example qdel 1234 myhost 5 qdel 1234 myhost Single quotes will work except where you are using shell variable substitution PBS Professional 13 0 Beta User s Guide UG 193 Chapter 8 Job Arrays 8 3 5 Special Attributes for Job Arrays Job arrays and subjobs have all of the attributes of a job In addition they have the following when appropriate These attributes are read only Table 8 1 Job Array Attributes
216. ms that the reservation can be made or rejects the request Once the reservation is confirmed PBS creates a queue for the reservation s jobs Jobs are then submitted to this queue When a reservation is confirmed it means that the reservation will not conflict with currently running jobs other confirmed reservations or dedicated time and that the requested resources are available for the reservation A reservation request that fails these tests is rejected All occurrences of a standing reservation must be acceptable in order for the standing reservation to be confirmed The pbs_rsub command returns a reservation ID which is the reservation name For an advance reservation this reservation ID has the format R lt unique integer gt lt server name gt For a standing reservation this reservation ID refers to the entire series and has the format S lt unique integer gt lt server name gt You specify the resources for a reservation using the same syntax as for a job Jobs in reserva tions are placed the same way non reservation jobs are placed in placement sets The xpbs GUI cannot be used for creation querying or deletion of reservations UG 174 PBS Professional 13 0 Beta User s Guide Reserving Resources Ahead of Time Chapter 7 The time for which a reservation is requested is in the time zone at the submission host The pbs_rsub command returns a reservation ID string and the current status of the reser vation For
217. n You must submit jobs that will run before other jobs with a type of on beforeok arg_list Jobs in arg_list may start only after this job terminates without errors beforenotok arg_list If this job terminates execution with errors the jobs in arg list may begin UG 146 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs Chapter 6 beforeany arg_list Jobs in arg list may start only after specified jobs terminate execution with or with out errors Requires use of on dependency for jobs that will run before other jobs on count This job may start only after count dependencies on other jobs have been satisfied This type is used in conjunction with one of the before types count is an integer greater than 0 The depend job attribute controls job dependencies You can set it using the qsub command line or a PBS directive qsub W depend PBS depend 6 2 2 Job Dependency Examples Example 6 1 You have three jobs job1 job2 and job3 and you want job3 to start after job1 and job2 have ended qsub jobl 16394 jupiter qsub job2 16395 jupiter qsub W depend afterany 16394 16395 job3 16396 jupiter Example 6 2 You want job2 to start only ifjob1 ends with no errors qsub jobl 16397 jupiter qsub W depend afterok 16397 job2 16396 jupiter Example 6 3 job1 should run before job2 and job3 To use the beforeany dependency you must use the on dependency qsub W depend on 2 jobl 16397 jupite
218. nal 13 0 Beta User s Guide Job Input amp Output Files Chapter 3 You can specify relative or absolute paths If you specify only a file name it is assumed to be relative to your home directory Do not use variables in the path The following examples show how you can specify paths PBS o u user1 myOutputFile PBS e u userl myErrorFile qsub o myOutputFile my _job qsub o u user1 myOutputFile my_job qsub o myWorkstation u userl myOutputFile my_job qsub e myErrorFile my_job qsub e u user1 myErrorFile my_job qsub e myWorkstation u userl myErrorFile my_job 3 3 2 3 Specifying Paths from Windows Hosts If you submit your job from a Windows host you may end up using special characters such as spaces backslashes and colons for specifying pathnames and you may need drive letter specifications The following examples are allowed qsub o temp my_out job scr qsub e myhost e Documents and Settings user Desktop output The error output of the example job is to be copied onto the e drive on myhost using the path Documents and Settings user Desktop output 3 3 2 4 Caveats for Paths Enclose arguments to qsub in quotes if the arguments contain spaces 3 3 3 Avoiding Creation of stdout and or stderr For each job PBS always creates the job s output and error files The location where files are created is listed in Table 3 4 How k sandbox o and e Options to qsub Affect stdout an
219. ng is 4096 characters 2 2 4 Job Tasks These can be programs or commands This is where you can specify an application to be run 2 2 5 Job Script Names We recommended that you avoid using special characters in job script names If you must use them on UNIX Linux you must escape them using the backslash character PBS Professional 13 0 Beta User s Guide UG 15 Chapter 2 Submitting a PBS Job 2 2 5 1 How PBS Parses a Job Script PBS parses a job script in two parts First the qsub command scans the script looking for directives and stops at the first executable line it finds This means that if you want qsub to use a directive it must be above any executable lines Any directive below the first execut able line is ignored The first executable line is the first line that is not a directive whose first non whitespace character is not and is not blank For information on directives see sec tion 2 2 3 4 Using PBS Directives on page 14 Second lines in the script are processed by the job shell PBS pipes the name of the job script file as input to the top shell and the top shell executes the job shell which runs the script You can specify which shell is the top shell see section 2 3 3 1 Specifying the Job s Top Shell on page 18 and under UNIX Linux which shell you want to run the script in the first execut able line of the script see section 2 3 3 2 Specifying Job Script Shell or
220. niBand is not specified in either the option or the environment variable US mode is not used for the job euidevice MP_EUIDEVICE Ignored by PBS euilib ip us MP_EUILIB If set to us the job runs in User Space mode If set to any other value that value is passed to IBM poe If the command line option euilib is set it takes precedence over the MP_EUILIB environment variable PBS Professional 13 0 Beta User s Guide UG 107 Chapter 5 Multiprocessor Jobs hostfile hfile MP_HOSTFILE Ignored If this is specified PBS prints the following pbsrun poe Warning hostfile value replaced by PBS or pbsrun poe Warning hfile value replaced by PBS If this environment variable is set when a poe job is submitted PBS prints the fol lowing error message pbsrun poe Warning MP_HOSTFILE value replaced by PBS instances MP_INSTANCES The option and the environment variable are treated differently instances If the option is set PBS prints a warning pbsrun poe Warning instances cmd line option removed by PBS MP_INSTANCES If the environment variable is set PBS uses it to calculate the number of net work windows for the job The maximum value allowed can be requested by using the string max for the environment variable If the environment variable is set to a value greater than the maximum allowed value it is replaced with the maximum allowed value The default maximum value is 4 procs MP_PROCS This
221. nored 6 3 2 Using Shrink to fit Jobs If you have jobs that can run for less than the expected time needed and still make useful progress you can make them shrink to fit jobs in order to maximize utilization You can use shrink to fit jobs for the following e Jobs that are internally checkpointed This includes jobs which are part of a larger effort where a job does as much work as it can before it is killed and the next job in that effort takes up where the previous job left off e Jobs using periodic PBS checkpointing e Jobs whose real running time might be much less than the expected time e When you have dedicated time for system maintenance and you want to take advantage of time slots right up until shutdown you can run speculative shrink to fit jobs if you can PBS Professional 13 0 Beta User s Guide UG 149 Chapter 6 Controlling How Your Job Runs risk having a job killed before it finishes Similarly speculative jobs can take advantage of the time just before a reservation starts e Any job where you do not mind running the job as a speculative attempt to finish some work 6 3 3 Running Time of a Shrink to fit Job 6 3 3 1 Setting Running Time Range for Shrink to fit Jobs It is only required that the job request min_walltime to be a shrink to fit job Requesting max_walltime without requesting min_walltime is an error You can set the job s running time range by requesting min_walltime and max_walltime for e
222. not appear to exist PBS Professional 13 0 Beta User s Guide UG 237 Chapter 10 Checking Job amp System Status 10 1 15 2 Job History In Standard Format You can use the x option to the qstat command to see information for finished moved queued and running jobs in standard format Usage qstat x Displays information for queued running finished and moved jobs in standard format qstat x lt job ID gt Displays information for a job regardless of its state in standard format Example 10 1 Showing finished and moved jobs with queued and running jobs qstat x Job id Name User Time Use S Queue 101 server1 STDIN userl 00 00 00 F workq 102 server1 STDIN userl 00 00 00 M destq server2 103 serverl STDIN userl 00 00 00 R workq 104 serverl STDIN userl 00 00 00 Q workq To see status for jobs job arrays and subjobs that are queued running finished and moved use qstat xt To see status for job arrays that are queued running finished or moved use qstat xJ UG 238 PBS Professional 13 0 Beta User s Guide Checking Job amp System Status Chapter 10 When information for a moved job is displayed the destination queue and server are shown as lt queue gt lt server gt Example 10 2 qstat x output for moved job destination queue is destq and destina tion server is server2 Job id Name User Time Use S Queue 101 sequoia STDIN userl 00 00 00 F workq 102 sequoia STDIN userl 00 00 00 M destq server2 103 s
223. ns 10 1 16 Caveats for Job Information e MoM periodically polls jobs for usage by the jobs running on her host collects the results and reports this to the server When a job exits she polls again to get the final tally of usage for that job For example MoM polls the running jobs at times T1 T2 T4 T8 T16 T24 and so on The output shown by a qstat during the window of time between T8 and T16 shows the resource usage up to T8 If the qstat is done at T17 the output shows usage up through T16 If the job ends at T20 the accounting log and the final log message and the email to you if qsub me was used in job submission contains usage through T20 e The final report does not include the epilogue The time required for the epilogue is treated as system overhead e The order in which jobs are displayed is undefined UG 240 PBS Professional 13 0 Beta User s Guide Checking Job amp System Status Chapter 10 10 2 Viewing Server Status To see server information in default format qstat B server_name To see server information in long format qstat B f server_name 10 2 1 Viewing Server Information in Default Format The B option to qstat displays the status of the specified PBS server One line of output is generated for each server queried The three letter abbreviations correspond to various job limits and counts as follows Maximum Total Queued Running Held Waiting Transiting and
224. nses are available You can do either of the following e Display license information for the current host qstat Bf e Display resources available including licenses on all hosts qmgr Qmgr print node default When looking at the server s license_count attribute use the sum of the Avail_Global and Avail_Local values PBS Professional 13 0 Beta User s Guide UG 249 Chapter 10 Checking Job amp System Status UG 250 PBS Professional 13 0 Beta User s Guide 11 Submitting Cray Jobs 11 1 Introduction You can submit jobs that are designed to run on the Cray using the PBS select and place syn tax 11 2 PBS Jobs on the Cray When you submit a job that is designed to run on the Cray you create a job script that con tains the same aprun command as a non PBS job but submit the job using the PBS select and place syntax You can translate the mpp syntax into select and place syntax using the tules described in section 11 3 2 Automatic Translation of mpp Resource Requests on page 256 You can submit a PBS job using mpp syntax but mpp syntax is deprecated If a job does not request a login node one is chosen for it A login node is assigned to each PBS job that runs on the Cray The job script runs on this login node Jobs requesting a vntype of cray_compute are expected to have an aprun in the job script to launch the job on the compute nodes PBS does not verify that the job script contains an aprun statem
225. nt may be not be used without the select statement 4 7 1 1 Specifying Arrangement of Chunks To place your job s chunks wherever they fit l place free To place all of the job s chunks on a single host l place pack To place each chunk on its own host l place scatter UG 78 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 To place each chunk on its own vnode l place vscatter 4 7 1 1 i Caveats and Restrictions for Arrangement e For all arrangements except vscatter chunks cannot be split across hosts but they can be split across vnodes on the same host If a job does not request vscatter for its arrange ment any chunk can be broken across vnodes This means that one chunk could be taken from more than one vnode e Ifthe job requests vscatter for its arrangement no chunk can be larger than a vnode and no chunk can be split across vnodes This behavior is different from other values for arrangement where chunks can be split across vnodes 4 7 1 2 Specifying Shared or Exclusive Use of Vnodes Each vnode can be allocated exclusively to one job or its resources can be shared among jobs Hosts can also be allocated exclusively to one job or shared among jobs How vnodes are allocated to jobs is determined by a combination of the vnode s sharing attribute and the job s resource request The possible values for the vnode sharing attribute and how they interact
226. nted every 12 minutes of wall time 6 4 3 Syntax for Specifying Checkpoint Interval Use the c checkpoint spec option to qsub to specify the interval in CPU minutes or in walltime minutes at which the job will be checkpointed The checkpoint spec argument is specified as 0 Job is checkpointed at the interval measured in CPU time set on the execution queue in which the job resides c lt minutes of CPU time gt Job is checkpointed at intervals of the specified number of minutes of CPU time used by the job This value must be greater than zero If the interval specified is less than that set on the execution queue in which the job resides the queue s interval is used Format Integer Job is checkpointed at the interval measured in walltime set on the execution queue in which the job resides UG 154 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs Chapter 6 w lt minutes of walltime gt Checkpointing is to be performed at intervals of the specified number of minutes of walltime used by the job This value must be greater than zero If the interval speci fied is less than that set on the execution queue in which the job resides the queue s interval is used Format Integer n Job is not checkpointed S Job is checkpointed only when the PBS server is shut down u oo Checkpointing is unspecified and defaults to the same behavior as s The Checkpoint job attribute cont
227. nts database Under Windows the default group assigned is determined by what the Windows API NetUserGetLocalGroup and NetUserGetGroup return as first entry PBS checks the former output the local groups and returns the first group it finds If the former call does not return any value then it proceeds to the latter call the Global groups If PBS does not find any output on the latter call it uses the default Everyone We do not recommend depending on always getting Users in this case Sometimes you may submit a job without the Wgroup_list option and get a default group of None assigned to your job 2 5 6 Specifying Accounting String You can associate an accounting string with your job by setting the value of the Account_Name job attribute This attribute has no default value You can set the value of Account_Name at the command line or in a PBS directive qsub A lt accounting string gt PBS Account _Name lt accounting string gt The lt accounting string gt can be any string of characters PBS does not attempt to interpret it 2 5 7 Specifying Server and or Queue By default PBS provides a default server and a default queue so that jobs submitted without a server or queue specification end up in the default queue at the default server UG 32 PBS Professional 13 0 Beta User s Guide Submitting a PBS Job Chapter 2 If your administrator has configured the PBS server with more than one queue and ha
228. nux if you do not specify a shell inside the job script PBS defaults to using bin sh If you specify a different shell inside the job script the top shell spawns that shell to run the script see section 2 3 3 2 Specifying Job Script Shell or Interpreter on page 19 Under Windows the job shell is the same as the top shell PBS Professional 13 0 Beta User s Guide UG 9 Chapter 2 Submitting a PBS Job 2 1 5 Scratch Space for Jobs When PBS runs your job it creates a temporary scratch directory for the job on each execu tion host If your administrator has not specified a temporary directory the root of the tempo rary directory is tmp Your administrator can specify a root for the temporary directory on each execution host using the tmpdir MoM parameter PBS creates the TMPDIR environ ment variable and sets it to the full path to the temporary scratch directory Under Windows PBS creates the temporary directory and sets TMP to the value of the Win dows TMPDIR environment variable If your administrator has not specified a tempo rary directory PBS creates the temporary directory under either winnt temp or windows temp PBS removes the directory when the job is finished The location of the temporary directory is set by PBS you should not set TMPDIR Your job script can access the scratch space For example UNIX cd TMPDIR Windows cd STMPDIR For scratch space for MPI jobs see section 5 2 3 Caveats for Us
229. ob UG 296 PBS Professional 13 0 Beta User s Guide Using the xpbs GUI Chapter 14 Table 14 3 xpbs Job Column Headings Heading Meaning Queue Queue in which job resides 14 3 5 xpbs Info Panel The Info panel shows the progress of the commands executed by xpbs Any errors are writ ten to this area The INFO panel also contains a minimize maximize button for displaying or iconizing the Info panel 14 3 6 xpbs Keyboard Tips There are a number of shortcuts and key sequences that can be used to speed up using xpbs These include Tip 1 All buttons which appear to be depressed in the dialog box subwindow can be acti vated by pressing the return enter key Tip 2 Pressing the tab key will move the blinking cursor from one text field to another Tip 3 To contiguously select more than one entry left click then drag the mouse across multiple entries Tip 4 To non contiguously select more than one entry hold the control left click on the desired entries 14 4 Setting xpbs Preferences The Preferences button is in the Menu Bar at the top of the main xpbs window Clicking it will bring up a dialog box that allows you to customize the behavior of xpbs 1 Define server hosts to query Select wait timeout in seconds PBS Professional 13 0 Beta User s Guide UG 297 Chapter 14 Using the xpbs GUI Specify xterm command for interactive jobs Specify which rsh ssh command to use
230. obs For jobs that are both MPI and multi threaded the number of threads per chunk for all chunks is set to the number of threads requested explicitly or implicitly in the first chunk except for MPIs that have been integrated with the PBS TM API For MPIs that are integrated with the PBS TM interface LAM MPI and Open MPI you can specify the number of threads separately for each chunk by specifying the ompthreads resource separately for each chunk For most MPIs the OMP_NUM_THREADS and NCPUS environment variables default to the number of ncpus requested for the first chunk Should you have a job that is both MPI and multi threaded you can request one chunk for each MPI process or set mpiprocs to the number of MPI processes you want on each chunk See section 5 1 3 Specifying Number of MPI Processes Per Chunk on page 95 PBS Professional 13 0 Beta User s Guide UG 141 Chapter 5 Multiprocessor Jobs 5 5 1 Examples Example 5 48 To request four chunks each with one MPI process two CPUs and two threads qsub 1 select 4 ncpus 2 UG 142 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 or qsub 1 select 4 ncpus 2 ompthreads 2 Example 5 49 To request four chunks each with two CPUs and four threads qsub 1 select 4 ncpus 2 ompthreads 4 Example 5 50 To request 16 MPI processes each with two threads on machines with two processors qsub 1 select 16 ncpus 2 Example 5 51 To reque
231. ocgrp are not under the control of PBS then the processes on those hosts will not be under the control of PBS 5 2 9 3 Restrictions The maximum number of ranks that can be launched under integrated MPICH GM is the number of entries in 6 BS_NODEFILE 5 2 10 MPICH MX with PBS 5 2 10 1 Using MPICH MX and MPD with PBS PBS provides an interface to MPICH MX s mpirun using MPD If executed inside a PBS job this allows for PBS to track all MPICH MX processes started by the MPD daemons so that PBS can perform accounting and have complete job control If executed outside of a PBS job it behaves exactly as if standard MPICH MX mpirun with MPD was used PBS Professional 13 0 Beta User s Guide UG 123 Chapter 5 Multiprocessor Jobs You use the same mpirun command as you would use outside of PBS Ifthe MPD daemons are not already running the PBS interface will take care of starting them for you 5 2 10 1 i Options Inside a PBS job script all of the options to the PBS interface are the same as mpirun with MPD except for the following m lt file gt The file argument contents are ignored and replaced by the contents of PBS_NODEFILE np If not specified the number of entries found in PBS_NODEFILE is used The maximum number of ranks that can be launched is the number of entries in PBS_NODEFILE Pg The use of the pg option for having multiple executables on multiple hosts is allowed but it is up to you to make sure only P
232. of PBScrayseg for the associated vnode is 0 For the second NUMA node the segment ordinal is 7 PBScrayseg is 7 and so on Non consumable Format String Default None PBS Professional 13 0 Beta User s Guide UG 255 Chapter 11 Submitting Cray Jobs 11 3 2 Automatic Translation of mpp Resource Requests When a PBS job or reservation is submitted using the mpp syntax PBS translates the mpp resource request into PBS select and place statements The translation uses the following tules For each chunk on a vnode representing a compute node the vntype resource is set to cray_compute Using mpp implies the use of compute nodes If the job requests lLvnode lt value gt the following becomes or is added to the equivalent chunk request vnode lt value gt If the job requests Lhost lt value gt the following becomes or is added to the equiva lent chunk request shost lt value gt Translating mppwidth When the job requests mppwidth e Ifmppnppn is specified the following happen nchunk number of chunks is set to mppwidth mppnppn e mpiprocs is set to mppnppn e lplace scatter is added to the request e Ifmppnppn is not specified the following happen e mppnppn is treated as if it is 1 nchunk number of chunks is set to mppwidth e lplace free is added to the request Translating mppnppn If mppnppn is not specified it defaults to 7 Translating mppdepth If mppdepth is not specified it de
233. of ranks that can be launched under integrated Intel MPI is the number of entries in PBS_NODEFILE 5 2 7 LAM MPI with PBS LAM MPI can be integrated with PBS on UNIX and Linux so that PBS can track resource usage signal processes and perform accounting for all job processes Your PBS administra tor can integrate LAM MPI with PBS 5 2 7 1 Using LAM 7 x with PBS You can run jobs under PBS using LAM 7 x without making any changes to your mpirun call 5 2 7 2 Using LAM 6 5 9 with PBS Support for LAM 6 5 9 is deprecated You can run jobs under PBS using LAM 6 5 9 5 2 7 2 i Caveats for LAM 6 5 9 with PBS e Ifyou specify the bhost argument PBS will print a warning saying that the bhost argument is ignored by PBS e Ifyou do not specify the where argument pbs_mpilam will try to run the your pro gram on all available CPUs using the C keyword PBS Professional 13 0 Beta User s Guide UG 117 Chapter 5 Multiprocessor Jobs 5 2 7 3 Example Job Submission Script The following is a simple PBS job script for use with LAM MPI 1 bin bash Job Name PBS N LamSubTest Merge output and error files PBS j oe Select 2 nodes with 1 CPU each PBS 1l select 2 ncpus 1 Export Users Environmental Variables to Execution Host PBS V Send email on abort begin and end PBS m abe Specify mail recipient PBS M username example com cd PBS_O WORKDIR date lamboot v PBS NODEFILE mpirun np cat PBS NODEFIL
234. of the available execution machines and the resources requested by the job are reserved for that job This is useful for debugging applications or for computational steering Interactive jobs can use provisioning 6 11 1 Input and Output for Interactive Jobs An interactive job comes complete with a pseudotty suitable for running commands that set terminal characteristics Once the interactive job has started execution input to and output from the job pass through qsub You provide all input to your interactive job through the ter minal session in which the job runs For interactive jobs you can specify PBS directives in a job script You cannot provide com mands to the job by using a job script For interactive jobs PBS ignores executable com mands in job scripts PBS Professional 13 0 Beta User s Guide UG 165 Chapter 6 Controlling How Your Job Runs 6 11 2 Running Your Interactive Job To run your job interactively you can do either of the following e Use qsub I at the command line e Use PBS interactive true ina PBS directive When your interactive job is running you can run commands executables shell scripts DOS commands etc These commands behave normally for example if the path to a command is not in your PATH environment variable you must provide the full path 6 11 3 Lifecycle of an Interactive Job p You start the interactive job using qsub I or PBS interactive true 2 Ifthere is a script PBS processe
235. ointed jobs will resume on the original nodeboards 5 2 16 6 Specifying Array Name You can specify the name of the array to use via the PBS_MPI_SGIARRAY environment variable 5 2 16 7 Using CSA PBS support for CSA on SGI systems is no longer available The CSA functionality for SGI systems has been removed from PBS 5 3 Using PVM with PBS You use the pvmexec command to execute a Parallel Virtual Machine PVM program PVM is not integrated with PBS PBS is limited to monitoring controlling and accounting for job processes only on the primary vnode 5 3 1 Arguments to pvmexec Command The pvmexec command expects a host file argument for the list of hosts on which to spawn the parallel job UG 138 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 5 3 2 Using PVM Daemons To start the PVM daemons on the hosts listed in 6 BS_NODEFILE 1 Start the PVM console on the first host in the list 2 Print the hosts to the standard output file named jobname o lt PBS job ID gt echo conf pvm PBS_NODEFILE To quit the PVM console but leave the PVM daemons running quit To stop the PVM daemons restart the PVM console and quit echo halt pvm 5 3 3 Submitting a PVM Job To submit a PVM job to PBS use the following qsub lt job script gt 5 3 4 Examples Example 5 41 To submit a PVM job to PBS use the following qsub your pvm job Here is an example script for your_pvm_job PBS N pvmjob
236. or setting a hold 6 5 3 Holding a Job Before Execution Normally PBS runs your job as soon as an appropriate slot opens up However you can tell PBS that the job is ineligible to run and should remain queued Use the h option to qsub to apply a user hold to the job when you submit it PBS accepts the job and places it in the held state The job remains held and ineligible to run until the hold is released The Hold_Types job attribute controls the job s holding behavior set it via qsub or a direc tive qsub h my job PBS h PBS Professional 13 0 Beta User s Guide UG 157 Chapter 6 Controlling How Your Job Runs 6 5 4 Holding a Job During Execution 6 5 4 1 Checkpointing and Requeueing the Job If your job is checkpointable you can stop its execution by holding it In this case the follow ing happens e The job is checkpointed e The resources assigned to the job are released e The job is put back in the execution queue in the Held state See section 6 4 1 Prerequisites for Checkpointing on page 154 To hold your job use the qhold command qsub h my job 6 5 4 2 Setting a Running Job s Hold Type If your job is not checkpointable qhold merely sets the job s Hold_Types attribute This has no effect unless the job is requeued with the qrerun command In that case the job remains queued and ineligible to run until you release the hold 6 5 5 Releasing a Job You can release one or more holds on a job by u
237. or were moved to another server See section 9 1 Current vs Historical Jobs on page 213 and section 10 1 15 Viewing Information for Finished and Moved Jobs on page 237 1 1 10 2 Reservation Fault Tolerance PBS attempts to reconfirm reservations for which associated vnodes have become unavail able See section 7 6 6 Reservation Fault Tolerance on page 189 1 1 11 New Features in Recent Releases 1 1 11 1 Path to Binaries 10 0 The path to the PBS binaries may have changed for your system If the old path was not one of opt pbs usr pbs or usr local pbs you may need to add opt pbs default bin to your PATH environment variable UG 4 PBS Professional 13 0 Beta User s Guide New Features Chapter 1 1 1 11 2 Job Specific Staging and Execution Directories 9 2 PBS can now provide a staging and execution directory for each job Jobs have new attributes sandbox and jobdir the MoM has a new parameter jobdir_root and there is a new environ ment variable called PBS_JOBDIR Ifthe job s sandbox attribute is set to PRIVATE PBS creates a job specific staging and execution directory If the job s sandbox attribute is unset or is set to HOME PBS uses the job submitter s home directory for staging and execution which is how previous versions of PBS behaved See section 3 2 Input Output File Stag ing on page 35 1 1 11 3 Standing Reservations 9 2 PBS now provides a facility for making stan
238. orm the request is for status of all jobs at that server If you specify a full destination identifier guewe server the request is for status of all jobs in the named queue at the named server 10 1 2 Viewing Basic Job Status You can use the qstat command to view basic job status in the default format Syntax for simple form and with options qstat qstat p J t x job_identifier destination The default display shows the following information e The job identifier assigned by PBS e The job name given by the submitter e The job owner e The CPU time used e The job state see Job States on page 421 of the PBS Professional Reference Guide e The queue in which the job resides UG 226 PBS Professional 13 0 Beta User s Guide Checking Job amp System Status The following example illustrates the default display of qstat qstat Job id Name User Time Use S Queue 16 south aims14 userl 0 H workq 18 south aims14 userl 0 W workq 26 south airfoil barry 00 21 03 R workq 27 south airfoil barry 21 09 12 R workq 28 south myjob user1 0 Q workq 29 south tns3d susan 0 Q workq 30 south airfoil barry 0 Q workq 31 south seq 35 3 donald 0 Q workg 10 1 3 Viewing Job Status in Alternate Format Chapter 10 You can use the gstat command to view more detail than the basic job information in the alternate format Syntax for simple form and with options qstat a qstat a w H i r G M J
239. ort a time to see that state See Reservation States on page 429 of the PBS Profes sional Reference Guide To view the status of a reservation use the pbs_rstat command It will display the status of all reservations at the PBS server For a standing reservation the pbs_rstat command will display the status of the soonest occurrence Duration is shown in seconds The pbs_rstat command will not display a custom resource which has been created to be invisible See section 4 3 8 Caveats and Restrictions on Requesting Resources on page 67 This command has three options Table 7 1 Options to pbs_rstat Command Option Meaning Description B Brief Lists only the names of the reservations S Short Lists in table format the name queue name owner state and start duration and end times of each reservation F Full Lists the name and all non default value attributes for each reservation lt none gt Default Default is S option UG 180 PBS Professional 13 0 Beta User s Guide Reserving Resources Ahead of Time Chapter 7 The full listing for a standing reservation is identical to the listing for an advance reservation with the following additions e Aline that specifies the recurrence rule reserve _rrule FREQ WEEKLY BYDAY MO COUNT 5 e Anentry for the vnodes reserved for the soonest occurrence of the standing reservation This entry also appears for an advance reservation but will be di
240. ory lselect 1 ncpus 2 mem 20gb ncpus 4 mem 100gb mem 5gb 5 1 1 1 Specifying Primary Execution Host The job s primary execution host is the host that supplies the vnode to satisfy the first chunk requested by the job 5 1 1 2 Request Most Specific Chunks First Chunk requests are interpreted from left to right The more specific the chunk the earlier it should be in the order For example if you require a specific host for chunk A but chunk B is not host specific request Chunk A first 5 1 2 The Job s Node File For each job PBS creates a job specific host file or node file which is a text file contain ing the name s of the host s containing the vnode s allocated to that job The file is created by the MoM on the primary execution host and is available only on that host PBS Professional 13 0 Beta User s Guide UG 93 Chapter 5 Multiprocessor Jobs 5 1 2 1 Node File Format and Contents The node file contains a list of host names one per line The name of the host is the value in resources_available host of the allocated vnode s The order in which hosts appear in the PBS node file is the order in which chunks are specified in the selection directive The node file contains one line per MPI process with the name of the host on which that pro cess should execute The number of MPI processes for a job and the contents of the node file are controlled by the value of the resource mpiprocs mpiprocs is the n
241. ory taken from anywhere l select 4 ncpus 1 mem 4GB l place free Allocate 4 chunks each with 1 CPU and 2GB of memory from between UG 82 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 one and four vnodes which have an arch of linux l select 4 ncpus 1 mem 2GB arch linux 1 place free 4 Allocate four chunks on 1 to 4 vnodes where each vnode must have 1 CPU 3GB of mem ory and node locked dyna license available for each chunk l select 4 dyna 1 ncpus 1 mem 3GB l place free 5 Allocate four chunks on to 4 vnodes and 4 floating dyna licenses This assumes dyna is specified as a server dynamic resource l dyna 4 1 select 4 ncpus 1 mem 3GB 1 place free 6 This selects exactly 4 vnodes where the arch is linux and each vnode will be on a sepa rate host Each vnode will have 1 CPU and 2GB of memory allocated to the job lselect 4 mem 2GB ncpus 1 arch linux lplace scatter 7 This will allocate 3 chunks each with 1 CPU and 10GB of memory This will also reserve 100mb of scratch space if scratch is to be accounted Scratch is assumed to be on PBS Professional 13 0 Beta User s Guide UG 83 Chapter 4 Allocating Resources amp Placing Jobs 10 11 12 13 14 15 a file system common to all hosts The value of place depends on the default which is place free l scratch 100mb 1 select 3 ncpus 1 mem 10GB This will allocate 2
242. ou can use the qselect command to list queued running finished and moved jobs job arrays and subjobs according to their time attributes The t option to the gselect com mand allows you to specify how you want to select based on time attributes You can also use the t option twice to bracket a time period Example 10 5 Select jobs with end time between noon and 3PM qselect te gt 09251200 te 1t 09251500 Example 10 6 Select finished and moved jobs with start time between noon and 3PM qselect x s MF ts gt 09251200 ts 1t 09251500 Example 10 7 Select all jobs with creation time between noon and 3PM qselect x tc gt 09251200 tc 1t 09251500 Example 10 8 Select all jobs including finished and moved jobs with qtime of 2 30PM default relation is eq qselect x tq09251430 UG 246 PBS Professional 13 0 Beta User s Guide Checking Job amp System Status Chapter 10 10 5 3 Selecting Jobs Using xpbs The xpbs command provides a graphical means of specifying job selection criteria offering the flexibility of the gselect command in a point and click interface Above the JOBS panel in the main xpbs display is the Other Criteria button Clicking it will bring up a menu that lets you choose and select any job selection criteria you wish The example below shows a user clicking on the Other Criteria button then selecting Job States to reveal that all job states are currently selected Clicking on any of these job states woul
243. ource also called a server level or queue level resource is a resource that is available to the entire job at the server or queue A job wide resource is available to be consumed or matched at the server or queue if you set the server or queue resources_available lt resource name gt attribute to the available or matching value For example you can define a custom resource called FloatingLicenses and set the server s resources_available FloatingLicenses attribute to the number of available floating licenses Examples of job wide resources are shared scratch space licenses or walltime A job can request a job wide resource for the entire job but not for individual chunks 4 3 Requesting Resources Your job can request resources that apply to the entire job or resources that apply to job chunks For example if your entire job needs an application license your job can request one job wide license However if one job process needs two CPUs and another needs 8 CPUs your job can request two chunks one with two CPUs and one with eight CPUs Your job can not request the same resource in a job wide request and a chunk level request PBS Professional 13 0 Beta User s Guide UG 59 Chapter 4 Allocating Resources amp Placing Jobs PBS supplies resources such as walltime that can be used only as job wide resources and other resources such as ncpus and mem that can be used only as chunk resources A resource is either job wide or chunk
244. ources PBS translates only the following mpp resources into select and place syntax mppwidth mppdepth mppnppn mppmem mpparch mpphost mpplabels mppnodes 11 7 5 2 mpp Resources Deprecated The mpp syntax is deprecated See Deprecations and Removals on page 12 in the PBS Professional Administrator s Guide 11 7 6 Do Not Mix mpp and select place Jobs cannot use both Impp syntax and lselect Iplace syntax 11 7 7 Do Not Request PBScrayorder Do not use PBScrayorder in a resource request 11 7 8 Do Not Suspend Jobs Do not attempt to use qsig s suspend on the Cray Attempting to suspend a job on the Cray will cause errors UG 274 PBS Professional 13 0 Beta User s Guide Submitting Cray Jobs Chapter 11 11 7 9 Request Fewer Chunks The more chunks in each translated job request the longer the scheduling cycle takes Jobs that request a value for mppnppn or ncpus effectively direct PBS to use the size of mppn ppn or ncpus as the value for ncpus for each chunk thus dividing the number of chunks by mppnppn or ncpus PBS Professional 13 0 Beta User s Guide UG 275 Chapter 11 Submitting Cray Jobs If you are on a homogeneous system we recommend that chunks use the value for ncpus for a vnode or for a compute node Example 11 17 Comparison of larger vs smaller chunk size and the effect on scheduling time Submit job with chunk size 1 and 8544 chunks qsub lmppwidth 8544 job Job s Resource_List Resour
245. ources Ahead of Time Chapter 7 A job submitted to a standing reservation without a restriction on when it can run will be run if possible during the soonest occurrence In order to submit a job to a specific occurrence use the a lt start time gt option to the qsub command setting the start time to the time of the occurrence that you want You can also use a cron job to submit a job at a specific time See qsub on page 219 of the PBS Professional Reference Guide and the cron 8 man page 7 5 2 Converting a Job into a Reservation Job The pbs_rsub command can be used to convert a normal job into a reservation job that will run as soon as possible PBS creates a reservation queue and a reservation and moves the job into the queue Other jobs can also be moved into that queue via qmove or submitted to that queue via qsub The reservation is called an ASAP reservation The format for converting a normal job into a reservation job is pbs_rsub I walltime time W qmove job_identifier Example pbs_rsub W qmove 54 pbs_rsub W qmove 1234 server The R and E options to pbs_ rsub are disabled when using the Ww qmove option For more information see pbs rsub on page 83 of the PBS Professional Reference Guide A job s default walltime is 5 years Therefore an ASAP reservation s start time can be in 5 years if all the jobs in the system have the default walltime You cannot use the pbs_ rsub command or any other comman
246. ources and AOE required to run a job You request an AOE for a job if that job requires that AOE You request provision ing for a job or reservation using the same syntax You can request an AOE for the entire job reservation l aoe lt AOE gt Example l aoe suse The lt AOE gt form cannot be used with select You can request an AOE for a single chunk job reservation l select lt chunk request gt aoe lt AOE gt Example ls select 1 ncpus 2 aoe rhel You can request the same AOE for each chunk of a job reservation l select lt chunk request gt aoe lt AOE gt lt chunk request gt aoe lt AOE gt Example l select 1 ncpus 1 aoe suse 2 ncpus 2 aoe suse 12 4 2 Commands and Provisioning If you try to use PBS commands on a job that is in the provisioning substate the commands behave differently The provisioning of vnodes is not affected by the commands if provision ing has already started it will continue The following table lists the affected commands Table 12 2 Effect of Commands on Jobs in Provisioning Substate Command Behavior While in Provisioning Substate qdel Without force Job is not deleted With force Job is deleted qsig s suspend Job is not suspended PBS Professional 13 0 Beta User s Guide UG 283 Chapter 12 Using Provisioning Table 12 2 Effect of Commands on Jobs in Provisioning Substate Command Behavior While in Provisioning Su
247. ources or alter resource requests e The qsub command both via command line and in PBS directives e The pbs_rsub command via command line only e The qalter command via command line only UG 60 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 4 3 2 Requesting Job wide Resources Your job can request resources that apply to the entire job in job wide resource requests A job wide resource is designed to be used by the entire job and is available at the server or a queue but not at the host level Job wide resources are used for requesting floating licenses or other resources not tied to specific vnodes such as cput and walltime Job wide resources are requested outside of a selection statement in this form l lt resource name gt value lt resource name gt value A resource request outside of a selection statement means that the resource request comes after 1 but not after lLselect In other words you cannot request a job wide resource in chunks For example to request one hour of walltime for a job l walltime 1 00 00 You can request job wide resources using any of the following e Theqsub 1 lt resource name gt lt value gt option You can request multiple resources using either format l lt resource gt lt value gt lt resource gt lt value gt 1l lt resource gt lt value gt l lt resource gt lt value gt e One ormore PBS 1 l
248. ow The resources specified in the Resource List section will be job wide resources only In order to specify chunks or job placement use a script To run an array job use a script You will not be able to query individual subjobs or the whole job array using xpbs Type the script into the File entry box Do not click the Load but ton Instead use the Submit button Finally review the optional settings to see if any should apply to this job PBS Professional 13 0 Beta User s Guide UG 301 Chapter 14 Using the xpbs GUI For example e Use the one of the buttons in the Output region to merge output and error files Use Stdout File Name to define standard output file and to redirect output e Use the Environment Variables to Export subwindow to have current environment vari ables exported to the job Jse the Job Name field in the OPTIONS subwindow to give the job a name se the Notify email address and one of the buttons in the OPTIONS subwindow to have PBS send you mail when the job terminates U U Now that the script is built you have four options of what to do next e Reset options to default e Save the script to a file e Submit the job as a batch job e Submit the job as an interactive batch job Reset clears all the information from the submit job dialog box allowing you to create a job from a fresh start Use the FILE field in the upper left corner to de
249. ow they are manipulated are described in the following sections A istbox can be multi selectable a number of entries can be selected highlighted using a mouse click or single selectable one entry can be highlighted at a time For a multi selectable listbox the following operations are allowed e left click to select highlight an entry e shift left click to contiguously select more than one entry e control left click to select multiple non contiguous entries e click the Select All Deselect All button to select all entries or deselect all entries at once e double clicking an entry usually activates some action that uses the selected entry as a parameter An entry widget is brought into focus with a left click To manipulate this widget simply type in the text value Use of arrow keys and mouse selection of text for deletion overwrite copy ing and pasting with sole use of mouse buttons are permitted This widget has a scrollbar for horizontally scanning a long text entry string A matrix of entry boxes is usually shown as several rows of entry widgets where a number of entries called fields can be found per row The matrix is accompanied by up down arrow buttons for paging through the rows of data and each group of fields gets one scrollbar for horizontally scanning long entry strings Moving from field to field can be done using the lt Tab gt move forward lt Cntrl f gt move forward or lt Cntrl b gt move ba
250. ox displays information about queues managed by the server host s selected from the Hosts panel each listbox entry can be selected as described above for the Hosts panel To the right of the Queues Panel area are buttons for actions that can be performed on selected queue s detail provides information about selected queue s This functionality can also be achieved by double clicking on a Queue listbox entry stop for stopping the selected queue s admin only start for starting the selected queue s admin only disable for disabling the selected queue s admin only enable for enabling the selected queue s admin only The middle portion of the Queues Panel has abbreviated column names indicating the infor mation being displayed as the following table shows Table 14 2 xpbs Queue Column Headings Heading Meaning Max Maximum number of jobs permitted Tot Count of jobs currently enqueued in any state Ena Is queue enabled yes or no Str Is queue started yes or no Que Count of jobs in the Queued state Run Count of jobs in the Running state Hld Count of jobs in the Held state UG 294 PBS Professional 13 0 Beta User s Guide Using the xpbs GUI Chapter 14 Table 14 2 xpbs Queue Column Headings Heading Meaning Wat Count of jobs in the Waiting state Tm Count of jobs in the Transiting state Ext Count of jobs in the Exiting state Type Type of queue execut
251. page 44 UG 8 PBS Professional 13 0 Beta User s Guide Submitting a PBS Job Chapter 2 2 1 2 Where and How Your PBS Job Runs Your PBS jobs run on hosts that the administrator has designated to PBS as execution hosts The PBS scheduler chooses one or more execution hosts that have the resources that your job requires PBS runs your jobs under your user account This means that your login and logout files are executed for each job and some of your environment goes with the job It s important to make sure that your login and logout files don t interfere with your jobs see section 2 4 2 Setting Up Your UNIX Linux Environment on page 13 2 1 3 The Job Identifier After you submit a job PBS returns a job identifier Format for a job sequence_number servername Format for a job array sequence_number servername domain You ll need the job identifier for any actions involving the job such as checking job status modifying the job tracking the job or deleting the job The largest possible job ID is the 7 digit number 9 999 999 After this has been reached job IDs start again at zero 2 1 4 Your Job s Shell Script s When PBS runs your job PBS starts the top shell that you specify for the job The top shell defaults to your login shell on the execution host but you can set another using the job s Shell_Path_List attribute See section 2 3 3 1 Specifying the Job s Top Shell on page 18 Under UNIX Li
252. pe Execution total_jobs 10 state_count Transit 0 Queued 7 Held 1 Waiting 1 Running 1 Exiting 0 resources _assigned ncpus 1 hasnodes False enabled True started True 10 3 3 Displaying Queue Limits in Alternate Format The q option to qstat displays any limits set on the requested or default queues Since PBS is shipped with no queue limits set any visible limits will be site specific The limits are listed in the format shown below qstat q server south Queue Memory CPU Time Walltime Node Run Que Im State 10 3 4 Caveats for the qstat Command When you use the f option to qstat to display attributes of jobs queues or servers attributes that are unset may not be displayed If you do not see an attribute it is unset 10 4 Viewing Job amp System Status with xpbs The main display of xpbs shows a brief listing of all selected servers all queues on those servers and any jobs in those queues that match the selection criteria discussed below Servers are listed in the HOST panel near the top of the display UG 244 PBS Professional 13 0 Beta User s Guide Checking Job amp System Status Chapter 10 To view detailed information about a given server i e similar to that produced by qstat B select the server in question then click the Detail button Likewise for details on a given queue i e similar to that produced by qstat fQ select the queue in question then click its co
253. performs installation and or setup Provisioned Vnode A vnode which through the process of provisioning has an OS or application that was installed or which has had a script run on it 12 2 How Provisioning Works Provisioning can be performed only on vnodes that have provisioning enabled shown in the vnode s provision_enable attribute Provisioning can be the following e Directly installing an OS or application e Running a script which may perform setup or installation Each vnode is individually configured for provisioning with a list of available AOEs in the vnode s resources_available aoe attribute Each vnode s current_aoe attribute shows that vnode s current AOE The scheduler queries each vnode s aoe resource and current_aoe attribute in order to determine which vnodes to provision for each job Provisioning can be used for interactive jobs PBS Professional 13 0 Beta User s Guide UG 279 Chapter 12 Using Provisioning A job s walltime clock starts when provisioning for the job has finished 12 2 1 Causing Vnodes To Be Provisioned An AOE can be requested for a job or a reservation When a job requests an AOE that means that the job will be run on vnodes running that AOE When a reservation requests an AOE that means that the reservation reserves vnodes that have that AOE available The AOE is instantiated on reserved vnodes only when a job requesting that AOE runs When the scheduler runs each job that
254. ple qsub option option lt ret gt PBS lt directive gt jobscript sh argl lt d gt 152 examplehost PBS Professional 13 0 Beta User s Guide UG 21 Chapter 2 Submitting a PBS Job If you need to pass arguments to a job you can do any of the following Pipe a shell command to qsub For example to directly pass myinfile and mydata as the input to a out type the following or make them into a shell script echo a out myinfile mydata qsub l select For example echo jobscript sh a argl b arg2 qsub 1 select For example to use an environment variable to pass myinfile as the input to a out type the following or make them into a shell script export INFILE tmp myinfile export INDATA tmp mydata echo a out S INFILE SINDATA qsub Use qsub lt executable gt lt arguments to executable gt See section 2 3 4 Submitting Jobs by Specifying Executable on page 22 2 3 4 Submitting Jobs by Specifying Executable You can run a PBS job by specifying an executable and its arguments instead of a job script When you specify only the executable with any options and arguments PBS starts a shell for you To submit a job from the command line the format is the following qsub options executable arguments to executable lt return gt For example to run myprog with the arguments a and b qsub myprog a b lt return gt To run myprog with the arguments a and b naming t
255. ple request four chunks using place pack Only one host is used and you can have each chunk request the HPS The HPS resource is a Boolean called hps qsub 1 select 4 ncpus 2 hps true l place pack UG 106 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 If your PBS administrator has configured a host level integer resource instead of a Boolean resource make sure that you request the correct value for this resource see your PBS admin istrator 5 2 5 4 Restrictions on poe Jobs e Outside of PBS you can run poe but you will see this warning pbsrun poe Warning not running under PBS e Inside PBS you cannot run poe jobs without arguments Attempting to do this will give the following error pbsrun poe Error interactive program name entry not supported under PBS poe exits with a value of 1 e Some environment variables and options to poe behave differently under PBS These differences are described in the next section e The maximum number of ranks that can be launched is the number of entries in PBS_NODEFILE 5 2 5 5 poe Options and Environment Variables The usage for poe is poe program program_options poe options When submitting jobs to poe you can set environment variables instead of using options to poe The equivalent environment variable is listed with its poe option All options and environment variables except the following are passed to poe devtype MP_DEVTYPE If Infi
256. qsub l select ngpus lt value gt lt rest of chunk specification gt Iplace excl Example 4 7 To submit the job named my_gpu_job requesting one node with two GPUs and one CPU and exclusive use of the node qsub lselect 1 ncpus 1 ngpus 2 lplace excl my_gpu_job It is up to the application or CUDA to bind the GPUs to the application processes 4 3 7 3 Requesting Non specific GPUs and Shared Use of Node Your administrator can configure PBS to allow your job to use non specific GPUs on a node while sharing GPU nodes In this case your administrator puts each GPU in its own vnode Your administrator can configure a resource to represent GPUs We recommend that the GPU resource is called ngpus Your administrator can configure each GPU vnode so it has a resource containing the device number of the GPU We recommend that this resource is called gpu_id Example 4 8 To submit the job named my_gpu_job requesting two GPUs and one CPU and shared use of the node qsub lselect 1 ncpus 1 ngpus 2 lplace shared my_gpu_job When a job is submitted requesting any GPU the PBS scheduler looks for a vnode with an available GPU and assigns that vnode to the job Since there is a one to one correspondence between GPUs and vnodes the job can determine the gpu_id of that vnode Finally the application can use the appropriate CUDA call to bind the process to the allocated GPU 4 3 7 4 Requesting Specific GPUs Your job can request one or
257. r qsub W depend beforeany 16397 job2 16398 jupiter qsub W depend beforeany 16397 job3 16399 jupiter PBS Professional 13 0 Beta User s Guide UG 147 Chapter 6 Controlling How Your Job Runs 6 2 3 Job Array Dependencies Job dependencies are supported e Between jobs and jobs e Between job arrays and job arrays e Between job arrays and jobs e Between jobs and job arrays Job dependencies are not supported for subjobs or ranges of subjobs 6 2 4 Using xpbs for Job Dependencies You can use xpbs to specify job dependencies In the Submit Job window in the other options section far left center of window click on one of the three dependency buttons after depend before depend or concurrency Any of these launches a Dependency window in where you can set up dependencies 6 2 5 Caveats and Advice for Job Dependencies 6 2 5 1 Correct Exit Status Required Under UNIX Linux make sure that job exit status is captured correctly see section 6 1 Using Job Exit Status on page 145 6 2 5 2 Permission Required for Dependencies To use the before types you must have permission to alter the jobs in arg_list Otherwise the dependency is rejected and the new job is aborted 6 2 5 3 Warning About Job History Enabling job history changes the behavior of dependent jobs If a job j1 depends on a finished job j2 for which PBS is maintaining history PBS puts j1 into the held state If job j1 depends
258. r later using standard MPICH e The job is submitted with qsub lnodes 5 1lmem 10GB e The master process of this job tries to use more than 2GB The job is killed where in lt 7 0 the master process could use 10GB before being killed 10GB is now a job wide limit divided up into a 2GB limit per chunk 4 8 4 2 Do Not Mix Old and New Styles Do not mix old style resource or node specifications 1lresource value or 1lnodes with select and place statements lLselect or lplace Do not use both in the command line Do not use both in the job script Do not use one in a job script and the other on the command line This will result in an error PBS Professional 13 0 Beta User s Guide UG 91 Chapter 4 Allocating Resources amp Placing Jobs 4 8 4 3 Resource Request Conversion Dependent on Where Resources are Defined A job s resource request is converted from old style to new according to various rules one of which is that the conversion is dependent upon where resources are defined For example The boolean resource Red is defined on the server and the boolean resource Blue is defined at the host level A job requests qsub 1 Blue true This looks like an old style resource request and PBS checks to see where Blue is defined Since Blue is defined at the host level the request is converted into 1 select 1 Blue true However if a job requests qsub 1 Red true while this loo
259. ractive job on a Cray must run on a login node See section 11 4 8 Specify Host for Interactive Jobs on page 262 e You cannot use the CLS command in an interactive job It will not clear the screen e After the scheduler has started the interactive job SIGINT Ctrl C is ignored 6 11 8 Errors and Logging e If PBS can not open a remote interactive shell to run an interactive job PBS prints the following error message qsub failed to run remote interactive shell e If IPC on the remote host cannot be connected PBS prints the following message Couldn t connect to host lt hostname gt e If PBS is successful in connecting to the IPC at the execution host but fails to execute the remote shell PBS prints the following error message Couldn t execute remote shell at host lt hostname gt 6 11 9 Receiving X Output from Interactive Jobs You can receive X output from an interactive job 6 11 9 1 How to Receive X Output To receive X output use qsub X I For example qsub I X lt return gt xterm lt return gt Control is returned here when your X process terminates You can background the process here if you want to UG 168 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs Chapter 6 6 11 9 1 i Receiving X Output on Non submission Host You can view your X output on a host that is not the job submission host For example you submit a job from SubHost and want to see th
260. rces than are available in one placement set the following happens e The job s comment is set to the following Not Running can t fit in the largest placement set and can t span psets e The following message is printed to the scheduler s log Can t fit in the largest placement set and can t span placement sets 11 8 3 All Requested mppnodes Not Found If mppnodes are requested but there are no vnodes that match the requested mppnodes i e 0 of the mppnodes list is found the job or reservation is rejected with the following mes sage The following error was encountered No matching vnodes for the given mppnodes lt mppnodes gt A log message is printed to the server log at event class 0x0004 translate mpp ERROR could not find matching vnodes for the given mppnodes lt mppnodes as input gt 11 8 4 Some Requested mppnodes Not Found If mppnodes are requested and only some of the mppnodes are found to match the vnodes then the job or reservation is accepted but the following is printed in the server log at event class 0x0004 translate mpp could not find matching vnodes for these given mppnodes lt comma separated list of mppnodes gt PBS Professional 13 0 Beta User s Guide UG 277 Chapter 11 Submitting Cray Jobs The job may or may not run depending on whether the vnodes that were matched up to the requested mppnodes have enough resources for the job 11 8 5 Bad mppnodes Range
261. rder command to change the order of two jobs within or across queues To order two jobs is to exchange the jobs positions in the queue or queues in which the jobs reside If job1 is at position 3 in queue A and job2 is at position 4 in queue B qorder ing them will result in job being in position 4 in queue B and job2 being in position 3 in queue A No attribute of the job such as Priority is changed The impact of changing the order within the queue s is dependent on local job scheduling policy contact your systems administrator for details Usage of the qorder command is qorder job_identifier1 job_identifier2 Job array identifiers must be enclosed in double quotes Both operands are job_identifiers which specify the jobs to be exchanged PBS Professional 13 0 Beta User s Guide UG 221 Chapter 9 Working with PBS Jobs qstat u bob Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 54 south bob workq twinkie 1 0 20Q 63 south bob workq airfoil 1 0 130 gorder 54 63 qstat u bob Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 63 south bob workq airfoil 1 0 130 54 south bob workg twinkie 1 0 20Q To change the order of two jobs using xpbs select the two jobs and then click the order but ton 9 6 1 Restrictions e The two jobs must be located at the same server and both jobs must be owned by you However a PBS Manager or Oper
262. re job_identifiers which specify the jobs to receive the message string For example qmsg E hello to my error e file 55 qmsg O hello to my output o file 55 qmsg this too will go to my error e file 55 To send a message to a job using xpbs first select the job s of interest then click the msg button Doing so will launch the Send Message to Job dialog box From this window you may enter the message you wish to send and indicate whether it should be written to the standard output or the standard error file of the job Click the Send Message button to complete the pro cess 9 5 Sending Signals to Jobs You can use the qsig command to send a signal to your job The signal is sent to all of the job s processes Usage syntax of the qsig command is qsig s signal job_identifier Job array job identifiers must be enclosed in double quotes If the s option is not specified SIGTERM is sent If the s option is specified it declares which signal is sent to the job The signal argument is either a signal name e g SIGKILL the signal name without the S G prefix e g K LL or an unsigned signal number e g 9 The sig nal name SIGNULL is allowed the server will send the signal 0 to the job which will have no effect Not all signal names will be recognized by qsig If it doesn t recognize the signal name try issuing the signal number instead The request to signal a batch job will be rejected if e You are
263. requests an AOE it either finds the vnodes that satisfy the job s requirements or provisions the required vnodes For example if SLES is available on a set of vnodes that otherwise suit your job you can request SLES for your job and regard less of the OS running on those vnodes before your job starts SLES will be running at the time the job begins execution 12 2 2 Using an AOE When you request an AOE for a job the requested AOE must be one of the AOEs that has been configured at your site For example if the AOEs available on vnodes are rhe and sles you can request only those you cannot request suse You can request a reservation for vnodes that have a specific AOE available This way jobs needing that AOE can be submitted to that reservation This means that jobs needing that AOE are guaranteed to be running on vnodes that have that AOE available Each reservation can have at most one AOE specified for it Any jobs that run in that reserva tion must not request a different AOE from the one requested for the reservation That is the job can run in the reservation if it either requests no AOE or requests the same AOE as the reservation 12 2 3 Job Substates and Provisioning When a job is in the process of provisioning its substate is provisioning This is the descrip tion of the substate provisioning The job is waiting for vnode s to be provisioned with its requested AOE Integer value is 77 See Job
264. rious error messages 5 2 5 6 iv Job Submission Format on POE Do not submit InfiniBand jobs in which the select statement specifies only a number for example export PBS_GET_ IBWINS 1 qsub koe m 1 select 1 V jobname Instead use the equivalent request which specifies a resource export PBS_GET_IBWINS 1 qsub koe m 1 select 1 ncpus 1 V jobname 5 2 5 6 v Environment Variables under POE Do not set the PBS_O_HOST environment variable If you do so using the qsub command with the V option will fail 5 2 5 7 Useful Information 5 2 5 7 i IBM Documentation For more information on using IBM s Parallel Operating Environment see IBM Parallel Environment for AIX SL Hitchhiker s Guide PBS Professional 13 0 Beta User s Guide UG 109 Chapter 5 Multiprocessor Jobs 5 2 5 7 ii Sources for Sample Code When installing the ppe poe fileset there are three directories containing sample code that may be of interest from How installing the POE fileset alters your system e Directory containing sample code for running User Space POE jobs without LoadLev eler usr lpp ppe poe samples swtbl e Directory containing sample code for running User Space jobs without LoadLeveler using the network table API usr lpp ppe poe samples ntbl e Directory that contains the sample code for running User Space jobs on InfiniBand inter connects without LoadLeveler using the network resource table API usr lpp ppe poe
265. rocesses on one chunk exceed one of these limits but the processes on the other are under the chunk limit the job can continue to run as long as the total used for both chunks is less than the host limit 4 5 5 Examples of Memory Limits Your administrator may choose to enforce memory limits If this is the case the memory used by the entire job cannot exceed the amount in Resource_List mem and the memory used at any host cannot exceed the sum of the chunks on that host For the following examples assume the following The queue has these settings resources default mem 200mb default _chunk mem 100mb Example 4 12 A job requesting 1 select 2 ncpus 1 mem 345mb uses 345mb from each of two vnodes and has a job wide limit of 690mb 2 345 The job s Resource_List mem shows 690mb Example 4 13 A job requesting 1 select 2 ncpus 2 takes 100mb via default_chunk from each vnode and has a job wide limit of 200mb 2 100mb The job s Resource_List mem shows 200mb Example 4 14 A job requesting 1 ncpus 2 takes 200mb inherited from resources_default and used to create the select specification from one vnode and has a job wide limit of 200mb The job s Resource_List mem shows 200mb Example 4 15 A job requesting Lnodes 2 inherits 200mb from resources_default mem which becomes the job wide limit The memory is taken from the two vnodes half 100mb from each The generated select specification is 2 ncpus 1 mem 100mb The job s Resour
266. rols the job s checkpoint interval You can set it using the qsub command line or a PBS directive Use qsub to specify that the job should use the execution queue s checkpoint interval qsub c c my_job Use a directive to checkpoint the job every 10 minutes of CPU time PBS c c 10 6 4 4 Using Checkpointing for Preempting or Holding Jobs Your site may need to preempt jobs while they are running or you may want to be able to place a hold your job while it runs To allow either of these make your job checkpointable This means that you should not mark it as non checkpointable do not use qsub c n your application must be checkpointable or there is a third party checkpointing application and your administrator must supply a checkpoint script to be run by the MoM where the job runs You can use application level checkpointing when your job is preempted or you place a hold on it to save the partial results When your checkpointed job is restarted your job script can find that the job was checkpointed and can start from the checkpoint file instead of starting from scratch If you try to hold a running job that is not checkpointable either it is marked as non check pointable or the script is missing or returns failure the job continues to run with its Hold_Types attribute set to h See section 6 5 Holding and Releasing Jobs on page 156 PBS Professional 13 0 Beta User s Guide UG 155 Chapter 6 Controlling How Your Job Run
267. rresponding Detail button The same applies for jobs as well i e qstat f You can view detailed information on any displayed job by selecting it and then click ing on the Detail button Note that the list of jobs displayed will be dependent upon the Selection Criteria currently selected This is discussed in the xpbs portion of the next section 10 5 Selecting a List of Jobs The qselect command provides a method to list the job identifier of those jobs job arrays or subjobs which meet a list of selection criteria Jobs are selected from those owned by a sin gle server The qselect command writes to standard output a list of zero or more job identi fiers which meet the criteria specified by the options Each option acts as a filter restricting the number of jobs which might be listed With no options the qselect command will list all jobs at the server which you are authorized to list query status of The u option may be used to limit the selection to jobs owned by you or other specified users For a description of the qselect command see gselect on page 192 of the PBS Profes sional Reference Guide For example say you want to list all jobs owned by user barry that requested more than 16 CPUs You could use the following qselect command syntax qselect u barry 1 ncpus gt 16 121 south 133 south 154 south Notice that what is returned is the job identifiers of jobs that match the selection criteria T
268. rror Code Recurrence rule missing valid COUNT or 15134 pbs_rsub error Unde UNTIL parameter fined iCalendar syn tax A valid COUNT or UNTIL is required Problem with the start and or end time of 15086 pbs_rsub Bad time the reservation such as specification s e Given start time is earlier than current date and time e Missing start time or end time e End time is earlier than start time Reservation duration exceeds 24 hours and 15129 pbs_rsub error DAILY the recurrence frequency FREQ is set to recurrence duration DAILY cannot exceed 24 hours Reservation duration exceeds 7 days and 15128 pbs rsub error the frequency FREQ is set to WEEKLY WEEKLY recurrence duration cannot exceed 1 week Reservation duration exceeds 1 hour and 15130 pbs_rsub error the frequency FREQ is set to HOURLY or HOURLY recurrence the BY rule is set to BYHOUR and occurs duration cannot exceed every hour such as BYHOUR 9 10 1 hour The PBS_TZID environment variable is None pbs rsub error a not set correctly at the submission host valid PBS TZID time rejection at submission host zone environment vari able is required The PBS_TZID environment variable is 15135 Unrecognized PBS TZID not set correctly at the submission host environment variable rejection at server UG 188 PBS Professional 13 0 Beta User s Guide Reserving Resources Ahead of Time Chapter 7 7 6 3 Time Required Betw
269. s 6 4 5 Caveats and Restrictions for Checkpointing e Checkpointing is not supported for job arrays e Ifyou do not specify qsub c checkpoint spec it is unspecified and defaults to 66 99 the same as s e PBS limits the number of times it tries to run a job to 21 and tracks this count in the job s run_count attribute If your job is checkpointed and requeued enough times it will be held 6 5 Holding and Releasing Jobs You can place a hold on your job to do the following e A queued job remains queued until you release the hold see section 6 5 3 Holding a Job Before Execution on page 157 e Arunning job stops running but can resume where it left off see section 6 5 4 1 Check pointing and Requeueing the Job on page 158 e Arunning job continues to run but is held if it is requeued see section 6 5 4 2 Setting a Running Job s Hold Type on page 158 You hold a job using the qhold command see qhold on page 152 of the PBS Professional Reference Guide You can release a held queued job to make it eligible to be scheduled to run and you can release a hold on a running job You release a hold on your job using the qr 1s command see qrls on page 187 of the PBS Professional Reference Guide 6 5 1 Types of Holds There are three types of holds user operator and system You can place a user hold upon any job that you own An Operator can place a user or operator hold on any job
270. s 1 PBScrayseg 0 4 ncpus 1 PBScrayseg 1 e Ifyou know about the underlying hardware the PBS resource request can take advantage of that On a homogenous system with 2 NUMA nodes per compute node and 4 PEs per NUMA node you can use the following PBS resource request qsub lselect 8 ncpus 1 lplace pack e Ifthe administrator has set up a resource that allows you to request NUMA node combi nations called for example segmentcombo you request a value for the resource that is the list of vnodes you want The equivalent select statement which uses this resource is the following qsub lselect 8 ncpus 1 segmentcombo 01 jobscript 11 5 3 3 Caveat When Using Combination or Number Resources You must use the same resource string values as the ones set up by the administrator 012 is not the same as 102 or 201 11 5 4 Requesting Groups of Login Nodes If you want to use groups of esLogin nodes and internal login nodes your administrator can set the vntype resource on these nodes to a special value for example cray_compile To submit a job requesting any combination of esLogin nodes and internal login nodes you specify the special value for the vntype resource in your select statement For example qsub lselect 4 ncpus 1 vntype cray_compile job 11 5 5 Using Internal Login Nodes Only Compiling preprocessing and postprocessing jobs can run on internal login nodes Internal login nodes have a vntype value of cray_login
271. s altairjp co jp Korea 82 70 4050 9200 support altair co kr Scandinavia 46 0 46 460 2828 support altair se UK 44 0 1926 468 600 pbssupport uk altair com This document is proprietary information of Altair Engineering Inc UG Contents 1 New Features 1 1 1 New Features ienmang 4 0 paenan a e Ret Osea a E E aaurE REG 1 1 2 DEPleCAliOnS sccc05008 eee oe sd nk pun EEEn ETRE siemens Gees 4 5 1 3 Backward Compatibility o aaua aaa 5 2 Submitting a PBS Job T 2 1 Introduction tothe PBS JOD 2 5 24 642 eWsa0ca04 5440546580 0088 7 2 2 The PBS JOO SciiPt lt 4 2a ce s 29etNehesee teste ESEARO EEDE 11 Ad Submitmg a PBS JOD es errei esredirire curen b Save EERE SENESE 17 2 4 Job Submission Recommendations and Advice 25 2 5 JOD SUDMISSION OPUONS s as srrase ri 050k a eH OE ARE eRe 25 3 Job Input amp Output Files 39 3 1 Introduction to Job File I O in PBS 2200000005 35 3 2 Input Output File Staging ccs echo sided eee dh hie ebecee ress 35 3 3 Managing Output and Error Files i222 nauau auaa aaa 48 4 Allocating Resources amp Placing Jobs 57 4 1 Whatisa ViNOdG rese cerien mirita a a a u A 57 4 2 PBS RESOUCEScerern angre en iseia ear hirda epee 58 4 3 Requesting Resources 2 0 cs ccc da od Boake QE Ea EER ROE EE 59 4 4 How Resources are Allocated to Jobs 000 eee eee 69 4 5 Limits on Resource Usage ia vsensiciad e ened deeds bebe de eaa eae 73 4 6 Viewing RESOURCES 513 5 orrera
272. s any directives in the script 3 The scheduler runs the job 4 Output is connected to the submission window 5 You run commands executables shell scripts etc interactively 6 The job is terminated 6 11 3 1 Terminating Interactive Jobs When you run an interactive job the qsub command does not terminate when the job is sub mitted qsub remains running until one of the following e You qdel the job e Someone else deletes the job e You exit the shell e The job is aborted e You interrupt qsub with a SIGINT the control C key before the scheduler starts the job Once the scheduler starts the job SIGINT is ignored UG 166 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs Chapter 6 Under UNIX Linux if you interrupt qsub before the job starts qsub queries whether you want it to exit If you respond yes qsub exits and the job is aborted Under Windows if you interrupt the job before it starts the job is deleted and the following messages are printed qsub wait for job lt job ID gt interrupted by signal 2 lt job ID gt is being deleted 6 11 4 Interactive Jobs and Exit Codes Under Windows if you specify an exit code when you exit the interactive session via exit lt exit code gt that exit code is used as the job s exit code This exit code is visible in the out put of the tracejob command Under UNIX Linux you cannot provide an exit code for the interactive session 6 11
273. s con figured those queues to accept jobs from you you can submit your job to a non default queue e Ifyou will submit jobs mainly to one non default server set the PBS_SERVER envi ronment variable to the name of your preferred server Once this environment variable is set to your preferred server you don t need to specify that server when you submit a job to it e Ifyou will submit jobs mostly to the default server and just want to submit this one to a specific queue at a non default server e Useqsub q lt queue name gt lt server name gt e Use PBS q lt queue name gt lt server name gt e Ifyou will submit jobs mostly to the default server and just want to submit this one to the default queue at a non default server e Useqsub q lt server name gt e Use PBS q lt server name gt e You can submit your job to a non default queue at the default server or the server given in the PBS_SERVER environment variable if it is defined e Useqsub q lt queue name gt e Use PBS q lt queue name gt If the PBS server has no default queue and you submit a job without specifying a queue the qsub command will complain PBS or your administrator may move your job from one queue to another You can see which queue has your job using qstat job ID The job s Queue attribute contains the name of the queue where the job resides Examples qsub q queue my_job qsub q server my job PBS q queueName qsub q queueName ser
274. s listed in procgrp are not under the con trol of PBS then the processes on those hosts will not be under the control of PBS 5 2 10 3 Restrictions The maximum number of ranks that can be launched under integrated MPICH MxX is the number of entries in 6 BS_NODEFILE 5 2 11 MPICH2 with PBS PBS provides an interface to MPICH2 s mpirun If executed inside a PBS job this allows for PBS to track all MPICH2 processes so that PBS can perform accounting and have com plete job control If executed outside of a PBS job it behaves exactly as if standard MPICH2 s mpirun had been used You use the same mpirun command as you would use outside of PBS PBS Professional 13 0 Beta User s Guide UG 127 Chapter 5 Multiprocessor Jobs When submitting PBS jobs under the PBS interface to MPICH2 s mpirun be sure to explic itly specify the actual number of ranks or MPI tasks in the qsub select specification Other wise jobs will fail to run with too few entries in the machinefile For instance the following erroneous specification PBS 1 select 1 ncpus 1 host hostAt1 ncpus 2 host hostB mpirun np 3 tmp mytask results in this P gt BS_NODEFILE listing hostA hostB which conflicts with the np 3 specification in mpirun as only two MPD daemons are started The correct way is to specify either of the following PBS 1 select 1 ncpus 1 host hostAt2 ncpus 1 host hostB PBS l select 1 ncpus 1 host hostAt1 ncpus 2 host hostB mpiprocs 2
275. s resource request are added PBS applies job wide default resources defined in the following places in this order e Via qsub The server s default_qsub_arguments attribute can include any requestable job wide resources e Via the queue Each queue s resources_default attribute defines each queue level job wide resource default in resources_default lt resource gt e Via the server The server s resources_default attribute defines each server level job wide resource default in resources_default lt resource gt UG 70 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 4 4 1 2 Applying Per chunk Default Resources For each chunk in the job s selection statement first qsub defaults are applied then queue chunk defaults are applied then server chunk defaults are applied If the chunk request does not include a resource listed in the defaults the default is added PBS applies default chunk resources in the following order e Via qsub The server s default_qsub_arguments attribute can include any requestable chunk resources e Via the queue Each queue s default_chunk attribute defines each queue level chunk resource default in default_chunk lt resource gt e Via the server The server s default_chunk attribute defines each server level chunk resource default in default_chunk lt resource gt Example 4 10 Applying chunk defaults if the queue in which the job is enqueued ha
276. s the following defaults defined default _chunk ncpus 1 default _chunk mem 2gb A job submitted with this selection statement select 2 ncpus 4 1 mem 9gb The job has this specification after the default_chunk elements are applied select 2 ncpus 4 mem 2gb 1 ncpus 1 mem 9gb In this example mem 2gb and ncpus are inherited from default_chunk 4 4 1 3 Caveat for Moving Jobs From One Queue to Another If the job is moved from the current queue to a new queue any default resources in the job s resource list that were contributed by the current queue are removed This includes a select specification and place directive generated by the rules for conversion from the old syntax If a job s resource is unset undefined and there exists a default value at the new queue or server that default value is applied to the job s resource list If either select or place is missing from the job s new resource list it will be automatically generated using any newly inherited default values PBS Professional 13 0 Beta User s Guide UG 71 Chapter 4 Allocating Resources amp Placing Jobs Given the following set of queue and server default values Server resources_default ncpus 1 Queue QA resources_default ncpus 2 default _chunk mem 2gb Queue QB default _chunk mem 1gb no default for ncpus The following examples illustrate the equivalent select specification for jobs submitted into queue QA and then moved to or submitted directly to queue Q
277. s which finished execution successfully and exited e Jobs terminated by PBS while running e Jobs whose execution failed because of system or network failure e Jobs which were deleted before they could start execution 9 1 2 Job History Information PBS can keep all job attribute information including the following e Submission parameters e Whether the job started execution e Whether execution succeeded e Whether staging out of results succeeded e Which resources were used PBS Professional 13 0 Beta User s Guide UG 213 Chapter 9 Working with PBS Jobs PBS keeps job history for the following jobs e Jobs that have finished execution e Jobs that were deleted e Jobs that were moved to another server The job history for finished and moved jobs is preserved and available for the specified dura tion After the duration has expired PBS deletes the job history information and it is no longer available The state of a finished job is F and the state of a moved job is M See Job States on page 421 of the PBS Professional Reference Guide Subjobs are not considered finished jobs until the parent array job is finished which happens when all of its subjobs have terminated execution 9 1 2 1 Working With Moved Jobs You can use the following commands with moved jobs They will function as they do with normal jobs gqalter qhold qmove qmsg qorder qrerun qris qrun qsig 9 1 2 2 PBS Commands and Finished Jobs The
278. shared by other vnodes on the compute node For example on vnodeA_2_0 resources available accelerator_memory 4196mb On vnodeA_2_1 resources available accelerator_memory tvnodeA 2_0 Consumable Format size Python type pbs size accelerator_model Indicates model of accelerator s associated with this vnode Host level On Cray PBS sets this resource only on vnodes with at least one accelerator whose state is UP Can be requested only inside of a select statement Non consumable Format String Python type str naccelerators Indicates number of accelerators on the host Host level On Cray PBS sets this resource only on vnodes whose hosts have at least one accelerator whose state is UP PBS sets this resource to the number of accelerators whose state is UP For Cray UG 252 PBS Professional 13 0 Beta User s Guide Submitting Cray Jobs Chapter 11 PBS sets this resource on the 0th NUMA node the vnode with PBScrayseg 0 and the resource is shared by other vnodes on the compute node For example on vnodeA_2_0 resources _available naccelerators 1 On vnodeA_2_1 resources available naccelerators vnodeA 2_0 Can be requested only inside of a select statement Consumable Format Long Python type int nchunk This is the number of chunks requested between plus symbols in a select statement For example if the select statement is lselect 4 ncpus 2 12 ncpus 8 the value of nchunk for the first part is 4 and for the second part
279. sing the qr1s command The usage syntax of the qrls command is the following qrls h hold list job_identifier For job arrays the job_identifier must be enclosed in double quotes If you try to release a hold on a job which is not held the gr1s command is ignored If you use the qr1s command to release a hold on a job that had been previously running and was checkpointed the hold is released and the job is returned to the queued Q state and the job becomes eligible to be scheduled to run when resources come available The qrls command does not run the job it simply releases the hold and makes the job eligi ble to be run the next time the scheduler selects it UG 158 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs Chapter 6 6 5 6 Caveats and Restrictions for Holding and Releasing Jobs The qhold command can be used on job arrays but not on subjobs or ranges of subjobs On job arrays the qhold command can be applied only in the Q B or W states This will put the job array in the H held state If any subjobs are running they will run to completion Job arrays cannot be moved in the H state if any subjobs are running Checkpointing is not supported for job arrays Even on systems that support checkpoint ing no subjobs will be checkpointed they will run to completion To hold a running job and stop its execution the job must be checkpointable See section 6 4 1
280. solute path inexecution_path 3 2 5 3 Required Permissions You must have read permission for any files or directories that you will stage in and write permission for any files or directories that you will stage out 3 2 5 4 Warning About Ampersand You cannot use the ampersand amp in any staging path Staging will fail 3 2 5 5 Interactive Jobs and File I O When an interactive job finishes staged files may not have been copied back yet UG 40 PBS Professional 13 0 Beta User s Guide Job Input amp Output Files Chapter 3 3 2 5 6 Copying Directories Into and Out Of the Staging and Execution Directory You can stage directories into and out of the staging and execution directory the same way you stage files The storage path and execution_path for both stagein and stageout can be a directory If you stagein or stageout a directory PBS copies that directory along with all of its files and subdirectories At the end of the job the directory including all files and subdi rectories is deleted This can create a problem if multiple jobs are using the same directory 3 2 5 7 Wildcards In File Staging You can use wildcards when staging files and directories according to the following rules e The asterisk matches one or more characters e The question mark matches a single character e All other characters match only themselves e Wildcards inside of quote marks are expanded e Wildcards cannot be us
281. sources allocated to or used by your job qstat f UG 76 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 Look at the following job attributes Resource_List lt resource name gt The amount of the resource that has been allocated to the job including defaults resources_used lt resource name gt The amount of the resource used by the job 4 6 2 1 Resources Shown in Job s Resource _List Attribute When your job requests a job wide resource or any of certain built in host level resources the value requested is stored in the job s Resource_List attribute as Resource_List lt resource name gt lt value gt When you request a built in host level resource inside multiple chunks the value in Resource_List is the sum over all of the chunks for that resource For a list of the resources that can appear in Resource_List see section 5 9 2 Resources Requested by Job on page 318 of the PBS Professional Administrator s Guide If your administrator has defined default values for any of those resources and your job has inherited any defaults those defaults control the value shown in the Resource_List attribute 4 7 Specifying Job Placement You can specify how your job should be placed on vnodes You can choose to place each chunk on a different host or a different vnode or your job can use chunks that are all on one host You can specify that all of the job s chunks should share
282. st two chunks each with eight CPUs and eight MPI tasks and four threads qsub 1 select 2 ncpus 8 mpiprocs 8 ompthreads 4 Example 5 52 For the following qsub 1 select 4 ncpus 2 This request is satisfied by four CPUs from VnodeA two from VnodeB and two from VnodeC so the following is written to PBS_NODEFILE VnodeA VnodeA VnodeB VnodeC The OpenMP environment variables are set for the four PBS tasks corresponding to the four MPI processes as follows e For PBS task 1 on VnodeA OMP_NUM_THREADS 2 NCPUS 2 e For PBS task 2 on VnodeA OMP_NUM_THREADS 2 NCPUS 2 e For PBS task 3 on VnodeB OMP_NUM_THREADS 2 NCPUS 2 e For PBS task 4 on VnodeC OMP_NUM_THREADS 2 NCPUS 2 Example 5 53 For the following qsub 1 select 3 ncpus 2 mpiprocs 2 ompthreads 1 This is satisfied by two CPUs from each of three vnodes VnodeA VnodeB and Vno deC so the following is written to PBS_NODEFILE VnodeA VnodeA VnodeB VnodeB Vnodec PBS Professional 13 0 Beta User s Guide UG 143 Chapter 5 VnodeC Multiprocessor Jobs The OpenMP environment variables are set for the six PBS tasks corresponding to the six MPI processes as follows For PBS task 1 on VnodeA For PBS task 2 on VnodeA For PBS task 3 on VnodeB For PBS task 4 on VnodeB For PBS task 5 on VnodeC For PBS task 6 on VnodeC OMP_NUM_THREADS 1 NCPUS 1 OMP_NUM_THREADS 1 NCPUS 1 OMP_NUM_THREADS 1 NCPUS 1 OMP_NUM_THREADS 1 NCPUS 1 OMP NUM
283. stat s Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 51 52 57 south userl workq aimsl4 1 0 01H Job held by userl on Wed Aug 22 13 06 11 2004 south userl workq aimsl4 1 0 01W Waiting on user requested start time south barry workq airfoil 930 1 0 13 R 0 01 Job run on host south started Thu Aug 23 at 10 56 south userl workq my job 1 0 10Q Not Running No available resources on nodes south susan workq solver 2 0 200 10 1 10 Showing State of Job Job Array or Subjob The t option to qstat will show the state of a job a job array object and all non X sub jobs The J option to qstat will show only the state of job arrays The combination of J and t options to qstat will show only the state of subjobs UG 234 PBS Professional 13 0 Beta User s Guide Checking Job amp System Status Chapter 10 For example qstat t Job ID Name User Time Use S Queue 44 hostl STDIN user1 0 B workq 44 1 hostl STDIN userl 00 00 00 R workq 44 2 hostl STDIN userl 0 Q workgq 44 3 hostl STDIN userl 0 Q workgq qstat J Job ID Name User Time Use S Queue 44 hostl STDIN userl 0 B workq qstat Jt Job ID Name User Time Use S Queue 44 1 hostl STDIN userl 00 00 00 R workq 44 2 hostl STDIN userl 0 Q workgq 44 3 hostl STDIN userl 0 Q workq 10 1 11 Printing Job Array Percentage Completed The p option to qstat
284. stl userl workq foojob 1 1 128mb 00 10 Q Jul 8 hostl userl workq foojob 1 1 128mb 00 10 Q 2010 ll hostl userl workq foojob 1 1 128mb 00 10 Q gt 5yrs 13 hostl userl workq foojob 1 1 128mb 00 10 Q If the start time for a job cannot be estimated the start time behaves as if it is unset For qstat T the start time appears as a question mark forqstat f the start time appears as a time in the past 10 1 13 1 Why Does Estimated Start Time Change The estimated start time for your job may change for the following reasons e Changes to the system such as vnodes going down or the administrator offlining vnodes e Ahigher priority job coming into the system or a shift in priority of the existing jobs 10 1 14 Viewing Job Status in Wide Format The w qstat option displays job status in wide format The total width of the display is extended from 80 characters to 120 characters The Job ID column can be up to 30 characters wide while the Username Queue and Jobname column can be up to 15 characters wide The SessID column can be up to eight characters wide and the NDS column can be up to four characters wide Note You can use this option only with the a n or s qstat options UG 236 PBS Professional 13 0 Beta User s Guide Checking Job amp System Status Chapter 10 10 1 15 Viewing Information for Finished and Moved Jobs You can view information for finished and moved jobs in the same way as for queued
285. t lt stageout file list gt You can use these as options to qsub or as directives in the job script For both stagein and stageout the file list has the form execution_path array_index storage_hostname storage_path array_index Y The name execution_path lt index number gt is the name of the file in the job s staging and exe cution directory on the execution host The execution_path can be relative to the job s staging and execution directory or it can be an absolute path The character separates the execution specification from the storage specification The name storage_path lt index number gt is the file name on the host specified by storage hostname For stagein this is the location where the input files come from For stageout this is where the output files end up when the job is done You must specify a storage hostname The name can be absolute or it can be relative to your home direc tory on the remote machine For stagein the direction of travel is from storage_path to execution_path For stageout the direction of travel is from execution_path to storage_path When staging more than one set of filenames separate the filenames with a comma and enclose the entire list in double quotes 8 4 3 2 Job Array Staging Syntax on Windows In Windows the stagein and stageout string must be contained in double quotes when using array_index PBS Professional 13 0 Beta User s Guide UG 197 Chapter
286. t automatically newline terminated Be sure to add one explicitly otherwise PBS job will get the follow ing error message More when the Windows command interpreter tries to execute that last line Drive mapping commands are typically put in the job script Do not use xcopy inside a job script Use copy robocopy or pbs_ rcp instead The xcopy command sometimes expects input from the user Because of this it must be assigned an input handle Since PBS does not create the job process with an input handle assigned xcopy can fail or behave abnormally if used inside a PBS job script PBS jobs submitted from cygwin execute under the native cmd environment and not under cygwin PBS Professional 13 0 Beta User s Guide UG 13 Chapter 2 Submitting a PBS Job 2 2 3 Setting Job Characteristics 2 2 3 1 Job Attributes PBS represents the characteristics of a job as attributes For example the name of a job is an attribute of that job stored in the value of the job s Job_Name attribute Some job attributes can be set by you some can be set only by administrators and some are set only by PBS For a complete list of PBS job attributes see Job Attributes on page 384 of the PBS Profes sional Reference Guide Job attributes are case insensitive 2 2 3 2 Job Resources PBS represents the things that a job might use as resources For example the number of CPUs and the amount of memory on an execution host are resources PBS comes with
287. t by typing qsub StageScript It will run in the staging and execution directory created by PBS See section 3 2 Input Output File Staging on page 35 PBS Professional 13 0 Beta User s Guide UG 201 Chapter 8 Job Arrays 8 4 4 Filenames for Standard Output and Standard Error The name for stdout for a subjob defaults to lt job name gt o lt sequence number gt lt index gt and the name for stderr for a subjob defaults to lt job name gt e lt sequence num ber gt lt index gt Example 8 8 The job is named fixgamma and the sequence number is 1234 The subjob with index 7 is 1234 7 lt server name gt For this subjob stdout and stderr are named fixgamma o1234 7 and fixgamma e1234 7 8 4 5 Job Array Dependencies Job dependencies are supported for the following relationships e Between job arrays and job arrays e Between job arrays and jobs e Between jobs and job arrays 8 4 5 1 Caveats for Job Array Dependencies Job dependencies are not supported for subjobs or ranges of subjobs 8 4 6 Job Array Exit Status The exit status of a job array is determined by the status of each of the completed subjobs It is only available when all valid subjobs have completed The individual exit status of a com pleted subjob is passed to the epilogue and is available in the E accounting log record of that subjob Table 8 3 Job Array Exit Status Exit Status Meaning 0 All subjobs of the job
288. t job by using the qalter command Any changes take effect after the current scheduling cycle Changes affect only queued jobs running jobs are unaffected unless they are rerun 6 3 4 1 i Making Non shrink to fit Jobs into Shrink to fit Jobs You can convert a normal non shrink to fit job into a shrink to fit job using the qalter command to set values for min_walltime and max_walltime Any changes take effect after the current scheduling cycle Changes affect only queued jobs running jobs are unaffected unless they are rerun 6 3 4 1 ii Making Shrink to fit Jobs into Non shrink to fit Jobs To make a shrink to fit job into a normal non shrink to fit job use the qalter command to do the following e Set the job s walltime to the value for max_walltime e Unset min_walltime e Unset max_walltime 6 3 5 Viewing Running Time for a Job 6 3 5 1 Viewing min_walltime and max_walltime You can use qstat f to view the values of min_walltime and max_walltime For exam ple qsub lmin walltime 01 00 15 max _walltime 03 30 00 job sh lt job id gt qstat f lt job id gt Resource List min_walltime 01 00 15 Resource List max_walltime 03 30 00 PBS Professional 13 0 Beta User s Guide UG 151 Chapter 6 Controlling How Your Job Runs You can use tracejob to display max_walltime and min_walltime as part of the job s resource list For example 12 16 2011 14 28 55 A user pbsadmin group Users project _pbs project default Resour
289. t list Specifying Job Group ID on page 31 W pwd password Per job Password Method on page 24 W run_count lt value gt Controlling Number of Times Job is Re run on page 162 W sandbox lt value gt Staging and Execution Directory User s Home vs Job specific on page 35 W stagein list Input Output File Staging on page 35 W stageout list Input Output File Staging on page 35 W umask nnn Changing UNIX Linux Job umask on page 53 X Receiving X Output from Interactive Jobs on page 168 Z Suppressing Printing Job Identifier to stdout on page 34 2 5 1 Specifying Email Notification For each job PBS can send mail to designated recipients when that job or subjob reaches spe cific points in its lifecycle There are points in the life of the job where PBS always sends email and there are points where you can choose to receive email see the table below for a list Table 2 2 Points in Job Reservation Lifecycle when PBS Sends Mail Point in Lifecycle Always Sent or Optional Job cannot be routed Optional Job is deleted by job owner Optional depends on qdel Wsuppress email Job is deleted by someone other than job owner PBS Professional 13 0 Beta User s Guide Always UG 27 Chapter 2 Submitting a PBS Job Table 2 2 Points in Job Reservation Lifecycle when PBS Sends Mail Point in Lifecycle Al
290. t names See section 2 5 3 Specify ing a Job s Project on page 30 1 1 5 2 Support for Accelerators on Cray You can request accelerators for Cray jobs See section 11 5 11 Requesting Accelerators on page 267 1 1 5 3 Support for X Forwarding for Interactive Jobs You can receive X output from interactive jobs See section 6 11 9 Receiving X Output from Interactive Jobs on page 168 UG 2 PBS Professional 13 0 Beta User s Guide New Features Chapter 1 1 1 6 New Features in PBS Professional 11 1 1 1 6 1 Support for Interlagos Hardware You can request Interlagos hardware for your jobs See section 11 5 10 Requesting Interla gos Hardware on page 267 1 1 7 New Features in PBS Professional 11 0 1 1 7 1 Improved Cray Integration PBS is more tightly integrated with Cray systems You can use the PBS select and place lan guage when submitting Cray jobs See section Submitting Cray Jobs on page 251 1 1 7 2 Enhanced Job Placement PBS allows job submitters to scatter chunks by vnode in addition to scattering by host PBS also allows job submitters to reserve entire hosts via a job s placement request See section 4 7 Specifying Job Placement on page 77 1 1 8 New Features in PBS Professional 10 4 1 1 8 1 Estimated Job Start Times PBS can estimate the start time and vnodes for jobs See section 10 1 13 Viewing Estimated Start Times For Jobs on page 236 1 1 8 2
291. t resource name gt lt value gt directives 4 3 3 Requesting Resources in Chunks A chunk specifies the value of each resource in a set of resources which are to be allocated as a unit to a job It is the smallest set of resources to be allocated to a job All of a chunk is taken from a single host One chunk may be broken across vnodes but all participating vnodes must be from the same host Your job can request chunk resources which are resources that apply to the host level parts of the job Host level resources can only be requested as part of a chunk Server or queue resources cannot be requested as part of a chunk A chunk resource is used by the part of the job running on that chunk and is available at the host level Chunks are used for requesting host related resources such as CPUs memory and architecture Chunk resources are requested inside a select statement A select statement has this form l select N chunk N chunk Now we ll explain the details A single chunk is requested using this form l select lt resource name gt lt value gt lt resource name gt lt value gt PBS Professional 13 0 Beta User s Guide UG 61 Chapter 4 Allocating Resources amp Placing Jobs For example one chunk might have 2 CPUs and 4GB of memory l select ncpus 2 mem 4gb To request multiples of a chunk prefix the chunk specification by the number of chunks l select number of chunks lt chunk specification gt For exa
292. tanding reservation ends any running jobs in that reservation are killed Any jobs still queued for that reservation are kept in the queued state They are allowed to run in future occurrences When the last occurrence of a standing reservation ends all jobs remaining in the reservation are deleted whether queued or running A job in a reservation cannot be preempted A job in a reservation runs with the normal job environment variables see section 6 11 10 c Using Environment Variables on page 170 7 5 4 1 Caveats for How Reservations Treat Jobs If you submit a job to a reservation and the job s walltime fits within the reservation period but the time between when you submit the job and when the reservation ends is less than the job s walltime PBS will start the job and then kill it if it is still running when the reservation ends 7 5 5 Who Can Use Your Reservation By default the reservation accepts jobs only from the user who created the reservation and accepts jobs submitted from any group or host You can specify a list of users and groups whose jobs will and will not be accepted by the reservation by setting the reservation s Authorized Users and Authorized Groups attributes using the U auth_user_list UG 186 PBS Professional 13 0 Beta User s Guide Reserving Resources Ahead of Time Chapter 7 and G auth_group_list options to pbs_rsub You can specify the hosts from which jobs can and cannot be submit
293. tdir In that directory we can run it by typing qsub StageScript It will run in homedir our home directory which is why the line cd homedir work is in the script Example 8 7 In this example we have the same script as before but we will run it in a stag ing and execution directory created by PBS StageScript takes two input files dataX and extrax and makes an output file newdatax as well as echoing which iteration it ison The dataX and extrax files will be staged from inputs to the staging and UG 200 PBS Professional 13 0 Beta User s Guide Job Arrays Chapter 8 execution directory then newdataxX will be staged from the staging and execution directory to outputs bin sh PBS N StagingExample PBS J 1 2 PBS W stagein data array index host1 homedir inputs data array_ index extra array_index host1 homedir inputs extra array_index PBS W stageout newdata array_index host1 homedir outputs newdata array_index echo Main script index PBS ARRAY INDEX cat data PBS ARRAY INDEX extra PBS ARRAY INDEX gt gt newdata PBS ARRAY INDEX Local path execution directory created by PBS we don t know the name Remote host data storage host hostl Remote path for inputs original data files dataX and extrax homedir inputs Remote path for results output of computation newdatax homedir outputs StageScript resides in homedir testdir In that directory we can run i
294. ted by setting the reservation s Authorized_Hosts attribute using the H auth_host_list option to pbs_rsub The administrator can also specify which users and groups can and cannot submit jobs to a reservation and the list of hosts from which jobs can and cannot be submitted For more information see pbs rsub on page 83 of the PBS Professional Reference Guide and Reservation Attributes on page 351 of the PBS Professional Reference Guide 7 6 Reservation Caveats and Errors 7 6 1 Time Zone Must be Correct The environment variable PBS_TZID must be set at the submission host The time for which a reservation is requested is the time defined at the submission host See section 2 4 5 Set ting the Submission Host s Time Zone on page 18 7 6 2 Reservation Errors The following table describes the error messages that apply to reservations Table 7 2 Reservation Errors Server ous Log Description of Error Error Message Error Code Invalid syntax when specifying a standing 15133 pbs _rsub error Unde reservation fined iCalendar syn tax Recurrence rule has both a COUNT and an 15134 pbs_rsub error Unde UNTIL parameter fined iCalendar syn tax COUNT or UNTIL is required PBS Professional 13 0 Beta User s Guide UG 187 Chapter 7 Reserving Resources Ahead of Time Table 7 2 Reservation Errors Server e Log Description of Error Error Message E
295. th unset Job submitter s home Not copied left in submit unset gt directory ter s home directory HOME or lt path lt path Job submitter s home Not copied left in submit unset gt gt directory ter s home directory PRIVATE unset unset Job specific execution PBS_O_WORKDIR directory created by which is job submission PBS directory PRIVATE unset lt path Job specific execution Destination specified in o gt directory created by lt path gt and or e PBS lt path gt PRIVATE lt path unset Job specific execution Not copied left in job spe gt directory created by cific execution directory PBS PRIVATE lt path lt path Job specific execution Not copied left in job spe gt gt directory created by cific execution directory PBS e You can specify a path for stdout and or stderr see section 3 3 2 Paths for Output and Error Files on page 50 e You can merge stdout and stderr see section 3 3 4 Merging Output and Error PBS Professional 13 0 Beta User s Guide UG 49 Chapter 3 Job Input amp Output Files Files on page 52 e You can prevent creation of stdout and or stderr see section 3 3 3 Avoiding Cre ation of stdout and or stderr on page 51 e You can choose whether to retain stdout and or stderr on the execution host see section 3 3 5 Keeping Output and Error Files on Execution Host on page 52 3 3 2 Paths for Output and Error Files 3 3 2 1 Default P
296. that request particular resources by listing them in the preempt_targets resource Syntax 1 preempt_targets queue lt queue name gt queue lt queue name gt Resource_List lt resource gt lt value gt Resource_List lt resource gt lt value gt For example to specify that your job can preempt jobs in the queue named QueueA and or jobs that requested arch linux l preempt _targets queue QueueA Resource List arch linux UG 172 PBS Professional 13 0 Beta User s Guide Reserving Resources Ahead of Time 7 1 Glossary Advance reservation A reservation for a set of resources for a specified time The reservation is only available to a specific user or group of users Standing reservation An advance reservation which recurs at specified times For example you can reserve 8 CPUs and 10GB every Wednesday and Thursday from 5pm to 8pm for the next three months Occurrence of a standing reservation An instance of the standing reservation An occurrence of a standing reservation behaves like an advance reservation with the following exceptions e while a job can be submitted to a specific advance reservation it can only be submitted to the standing reservation as a whole not to a specific occurrence You can only specify when the job is eligible to run See qsub on page 219 of the PBS Professional Reference Guide e when an advance reservation ends it and all of its jobs running or queued are deleted
297. that signal completion of the job The Track Job button will flash red on the xpbs main display and if you then click it xpbs will display a list of all completed jobs that you were previously tracking Selecting one of those jobs will launch a window containing the stan dard output and standard error files associated with the job To enable xpbs job tracking click on the Track Job button at the top center of the main xpbs display Doing so will bring up the Track Job dialog box shown below Periodically check completion of jobs for user Location of Job Output Files local delete o remote every 5 A mins 7 Jobs Found Completed start reset tracking stop tracking help From this window you can name the users whose jobs you wish to monitor You also need to specify where you expect the output files to be either local or remote e g will the files be retained on the server host or did you request them to be delivered to another host Next UG 248 PBS Professional 13 0 Beta User s Guide Checking Job amp System Status Chapter 10 click the start reset tracking button and then the close window button Note that you can dis able job tracking at any time by clicking the Track Job button on the main xpbs display and then clicking the stop tracking button The Track Job feature is not available on Windows 10 7 Checking License Availability You can check to see where lice
298. the options to the pbs_rsub command see pbs_rsub on page 83 of the PBS Profes sional Reference Guide 7 3 2 Creating Advance Reservations You create an advance reservation using the pbs_rsub command PBS must be able to cal culate the start and end times of the reservation so you must specify two of the following three options D Duration E End time R Start time 7 3 2 1 Setting Time Zone for Advance Reservations If you need the time zone for your advance reservation to be UTC set this when you create the reservation TZ UIC pbs_rsub R 7 3 2 2 Examples of Creating Advance Reservations The following example shows the creation of an advance reservation asking for 1 vnode 30 minutes of wall clock time and a start time of 11 30 Since an end time is not specified PBS will calculate the end time based on the reservation start time and duration pbs_rsub R 1130 D 00 30 00 PBS returns the reservation ID R226 south UNCONFIRMED The following example shows an advance reservation for 2 CPUs from 8 00 p m to 10 00 p m pbs_rsub R 2000 00 E 2200 00 1l select 1 ncpus 2 PBS returns the reservation ID R332 south UNCONFIRMED PBS Professional 13 0 Beta User s Guide UG 175 Chapter 7 Reserving Resources Ahead of Time 7 3 3 Creating Standing Reservations You create standing reservations using the pbs_rsub command You must specify a start and end date when creating a standing reservation The recurring natur
299. ting records for job arrays and subjobs are the same as for jobs When a job array has been moved from one server to another the subjob accounting records are split between the two servers Subjobs do not have Q records 8 3 9 Prologues and Epilogues If defined prologues and epilogues run at the beginning and end of each subjob but not for the array object 8 3 10 The Rerunnable Flag and Job Arrays Job arrays are required to be rerunnable PBS will not accept a job array that is marked as not rerunnable You can submit a job array without specifying whether it is rerunnable and PBS will automatically mark it as rerunnable PBS Professional 13 0 Beta User s Guide UG 195 Chapter 8 Job Arrays 8 4 Submitting a Job Array 8 4 1 Job Array Submission Syntax You submit a job array through a single command You specify subjob indices at submission You can specify any of the following e Acontiguous range e g 1 through 100 e Arrange with a stepping factor e g every second entry in 1 through 100 1 3 5 99 Syntax for submitting a job array qsub J lt index start gt lt index end gt stepping factor where index start is the lowest index number in the range index end is the highest index number in the range stepping factor is the optional difference between index numbers The index start and end must be whole numbers and the stepping factor must be a positive integer The index end must be greater than t
300. to run your job and sends your job to the selected execution host s Licenses are obtained On each execution host PBS creates a job specific staging and execution directory PBS sets PBS_JOBDIR and the job s jobdir attribute to the path of the job s staging and execution directory On each execution host allocated to the job PBS creates a job specific temporary direc tory PBS sets the TMPDIR environment variable to the pathname of the temporary directory If any errors occur during directory creation or the setting of variables the job is requeued Input files or directories are copied to the primary execution host If needed cpusets are created If it exists the prologue runs on the primary execution host with its current working directory set to PBS_HOME mom_priv and with PBS_JOBDIR and TMPDIR set in its environment The job runs under your login If it exists the epilogue runs on the primary execution host with its current working directory set to the path of the job s staging and execution directory and with PBS_JOBDIR and TMPDIR set in its environment Output files or directories are copied to specified locations Temporary files and directories are cleaned up Licenses are returned to pool Any cpusets are deleted For more detail about the lifecycle of a job see section 3 2 7 Summary of the Job s Lifecy cle on page 44 and section 3 2 8 Detailed Description of Job s Lifecycle on
301. tput and or error files remain on exe cution host User settable per job via qsub k or through a PBS directive If the Keep_Files attribute is set to o and or e output and or error files remain in the staging and execu tion directory and the job s sandbox attribute is set to PRI VATE standard out and or error files are removed when the staging directory is removed at job end along with its con tents jobdir attribute Set to pathname of staging and execution directory on pri mary execution host Read only viewable viaqstat f PBS_JOBDIR environ ment variable Set to pathname of staging and execution directory on pri mary execution host Added to environments of job script process job tasks and prologue and epilogue TMPDIR environment variable Location of job specific scratch directory UG 38 PBS Professional 13 0 Beta User s Guide Job Input amp Output Files Chapter 3 3 2 4 Specifying Files To Be Staged In or Staged Out You can specify files to be staged in before the job runs and staged out after the job runs by setting the job s stagein and stageout attributes You can use options to qsub or directives in the job script qsub Wstagein lt stagein file list gt Wstageout lt stageout file list gt PBS stagein lt file list gt PBS stageout lt file list gt The file list has the following form execution_path hostname storage_pathf for both stagein and stageout
302. tually running on a login node which requested a vntype of cray_login do appear in the login node s vnode s jobs attribute 11 6 4 1 Caveats When Listing Jobs Jobs that requested a vntype of cray_compute that were launched from an internal login node are not listed in the jobs attribute of the internal login node PBS Professional 13 0 Beta User s Guide UG 269 Chapter 11 Submitting Cray Jobs 11 6 4 2 Example Output Example of pbsnodes av output for segments 0 and 1 on the same compute node examplehost_8 0 Mom exampleMom ntype PBS state free pcpus 6 resources available accelerator True resources available accelerator_memory 4096mb resources available accelerator model Tesla_x2090 resources available arch XT resources available host examplehost_8 resources available mem 8192000kb resources available naccelerators 1 resources available ncpus 6 resources available PBScrayhost examplehost resources available PBScraynid 8 resources available PBScrayorder 1 resources available PBScrayseg 0 resources available vnode examplehost_8 0 resources available vntype cray_compute resources _assigned accelerator_memory 0kb Okb resources _assigned mem 0kb resources _assigned mem resources _assigned naccelerators 0 resources _assigned ncpus 0 resources _assigned netwins 0 resources _assigned vmem 0kb resv_enable True sharing force _exclhost examplehost_8 1 Mom
303. u can create environment variables for your job The environment variables created by PBS begin with PBS_ The environment variables that PBS takes from your submission originating environment begin with PBS_O_ UG 170 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs Chapter 6 For example here are a few of the environment variables that accompany a job with typical values PBS _O HOME u user1 PBS _O LOGNAME user1 PBS_O PATH usr bin usr local bin bin PBS _O SHELL bin tcsh PBS_O HOST host1 PBS_O WORKDIR u user1 PBS _JOBID 16386 server1 For a complete list of PBS environment variables see PBS Environment Variables on page 471 of the PBS Professional Reference Guide 6 11 10 1 Exporting All Environment Variables The V option declares that all environment variables in the qsub command s environment are to be exported to the batch job qsub V my job PBS V 6 11 10 2 Exporting Specific Environment Variables The v variable list option to qsub allows you to specify additional environment variables to be exported to the job variable_list names environment variables from the qsub command environment which are made available to the job when it executes These variables and their values are passed to the job These variables are added to those already automati cally exported Format comma separated list of strings in the form v variable or v variabl
304. u can request is IKB If you request 400 bytes you get 1KB If you request 1400 bytes you get 2KB 4 3 8 7 Maximum Length of Job Submission Command Line The maximum length of a command line in PBS is 4095 characters When you submit a job using the command line your select and place statements and the rest of your command line must fit within 4095 characters 4 3 8 8 Only One select Statement Per Job You can include at most one select statement per job submission 4 3 8 9 The software Resource is Job wide The built in resource software is not a vnode level resource See Built in Resources on page 307 of the PBS Professional Reference Guide 4 3 8 10 Do Not Mix Old and New Syntax Do not mix old and new syntax when requesting resources See section 4 8 Backward Com patibility on page 86 for a description of old syntax 4 4 How Resources are Allocated to Jobs Resources are allocated to your job when the job explicitly requests them and when PBS applies defaults Jobs explicitly request resources either at the vnode level in chunks defined in a selection statement or in job wide resource requests We will cover requesting resources in section 4 3 3 Requesting Resources in Chunks on page 61 and section 4 3 2 Requesting Job wide Resources on page 61 The administrator can set default resources at the server and at queues so that a job that does not request a resource at submission time ends
305. ubmitting a PBS Job Chapter 2 The Wpwd option to the qsub command is supported only on Windows and all supported Linux platforms on x86 and x86_ 64 2 4 Job Submission Recommendations and Advice 2 4 1 Trapping Signals in Script You can trap signals in your job script For example you can trap preemption and suspension signals If you want to trap the signal in your job script the signal may need to be trapped by all of the job s shells depending on the signal The signal TERM is useful because it is ignored by shells but you can trap it and do useful things such as write out status Example 2 6 Ignore the listed signals trap 12 3 15 Example 2 7 Call the function goodbye for the listed signals trap goodbye 1 2 3 15 2 5 Job Submission Options The table below lists the options to the qsub command and points to an explanation of each Table 2 1 Options to the qsub Command Option Function and Page Reference A account_string Specifying Accounting String on page 32 a date _time Deferring Execution on page 164 C DPREFIX Changing the Directive Prefix on page 15 c interval Using Checkpointing on page 154 e path Paths for Output and Error Files on page 50 PBS Professional 13 0 Beta User s Guide UG 25 Chapter 2 Submitting a PBS Job Table 2 1 Options to the qsub Command Option Function and Page Reference Holding and Releasing Jobs
306. uest is constructed using the following rules Table 11 2 How Cray Elements Are Derived From exec_vnode Cray Element Terms exec_vnode Term Processing Element PE mpiprocs Requested number of PEs com pute node in this section of job request width Number of threads per PE depth Total mpiprocs on vnodes representing compute node involved in this section of job request total assigned ncpus on vnodes representing a compute node total mpiprocs on vnodes repre senting a compute node Memory per PE mem total memory in chunk request total mpiprocs in chunk Number of PEs per compute node nppn Sum of mpiprocs on vnodes representing a com pute node Number of PEs per segment npps Not used Number of segments per node Not used nspn NUMA node segments Not used 11 6 6 Viewing Accelerator Information There is no aprun interface for requesting accelerator memory or model so this information is not translated into Cray elements To see this information look in the MoM logs for the job s login node UG 272 PBS Professional 13 0 Beta User s Guide Submitting Cray Jobs Chapter 11 11 7 Caveats and Advice 11 7 1 Use select and place Instead of mpp It is recommended to use select and place instead of mpp resources The mpp resources are deprecated 11 7 2 Using Combination or Number Resources When requesting a resource that is set up
307. uest will be rejected Resources are modified by using the 1 option either in chunks inside of selection statements or in job wide modifications using resource_name value pairs The selection state ment is of the form l select N chunk N chunk where N specifies how many of that chunk and a chunk is of the form resource_name value resource_name value Job wide resource_name value modifications are of the form l resource_name value resource_name value Placement of jobs on vnodes is changed using the place statement l place modifier modifier where modifier is any combination of group excl exclhost and or one of free pack scatter vscatter UG 216 PBS Professional 13 0 Beta User s Guide Working with PBS Jobs Chapter 9 The usage syntax for qalter is qalter job resources job list The following examples illustrate how to use the qalter command First we list all the jobs of a particular user Then we modify two attributes as shown increasing the wall clock time from 20 to 25 minutes and changing the job name from airfoil to engine qstat u barry Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 51 south barry workq airfoil 930 1 0 16 R 0 01 54 south barry workq airfoil 1 0 20 Q0 qalter 1 walltime 20 00 N engine 54 qstat a 54 Req d Elap Job ID User Queue Jobname Sess NDS TSK Mem Time S Time 54 south barry workq engine 1 0 25
308. umber of MPI processes per chunk and defaults to 7 where the chunk contains CPUs O otherwise For each chunk requesting mpiprocs M the name of the host from which that chunk is allo cated is written in the node file M times Therefore the number of lines in the node file is the sum of requested mpiprocs for all chunks requested by the job Example 5 2 Two MPI processes run on HostA and one MPI process runs on HostB The node file looks like this HostA HostA HostB 5 1 2 2 Name and Location of Node File The file is created by the MoM on the primary execution host in PBS_HOME aux JOB_ID where JOB_ID is the job identifier for that job The full path and name for the node file is set in the job s environment in the environment variable PBS_NODEFILE 5 1 2 3 Node File for Old style Requests For jobs which request resources using the old nodes nodespec format the host for each vnode allocated to the job is listed N times where N is the number of MPI ranks on the vnode The number of MPI ranks is specified via the ppn resource Example 5 3 Request four vnodes each with two MPI processes where each process has three threads and each thread has a CPU qsub lnodes 4 ncpus 3 ppn 2 This results in each of the four hosts being written twice in the order in which the vnodes are assigned to the job UG 94 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 5 1 2 4 Using and Modifying the Node File
309. umerical resource that is unset on a host is treated as if it were zero and an unset string cannot satisfy a request An unset Boolean resource is treated as if it were set to False An unset resource at the server or queue is treated as if it were infinite 4 3 8 5 Caveat for Invisible or Unrequestable Resources Your administrator may define custom resources which restricted so that they are invisible or are visible but unrequestable Custom resources which were created to be invisible or unre questable cannot be requested or altered The following is a list of the commands normally used to view or request resources or modify resource requests and their limitations for restricted resources pbsnodes Job submitters cannot view restricted host level custom resources pbs_rstat Job submitters cannot view restricted reservation resources pbs_rsub Job submitters cannot request restricted custom resources for reservations gqalter Job submitters cannot alter a restricted resource qmgr Job submitters cannot print or list a restricted resource qselect Job submitters cannot specify restricted resources via 1 Resource List qsub Job submitters cannot request a restricted resource qstat Job submitters cannot view a restricted resource UG 68 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 4 3 8 6 Warning About Requesting Tiny Amounts of Memory The smallest unit of memory yo
310. up being allocated the default value for that resource We will cover default resources in section 4 4 1 Applying Default Resources on page 70 PBS Professional 13 0 Beta User s Guide UG 69 Chapter 4 Allocating Resources amp Placing Jobs The administrator can also specify default arguments for qsub so that jobs automatically request certain resources Resource values explicitly requested by your job override any qsub defaults See qsub on page 219 of the PBS Professional Reference Guide 4 4 1 Applying Default Resources PBS applies resource defaults only where the job has not explicitly requested a value for a resource Job wide and per chunk resources are applied with the following order of precedence via the following 1 Resources that are explicitly requested via 1 lt resource gt lt value gt and 1 select lt chunk gt 2 Default qsub arguments 3 The queue s default_chunk lt resource gt 4 The server s default_chunk lt resource gt 5 The queue s resources_default lt resource gt 6 The server s resources_default lt resource gt 7 The queue s resources_max lt resource gt 8 The server s resources_max lt resource gt 4 4 1 1 Applying Job wide Default Resources The explicit job wide resource request is checked first against default qsub arguments then against queue resource defaults then against server resource defaults Any default job wide resources not already in the job
311. ur administrator may define placement sets for your site A placement set is a group of vnodes that share a value for a resource By default placement sets attempt to group vnodes that are close to each other If your job doesn t request a specific placement and placement sets are defined your job may automatically run in a placement set See Placement Sets on page 222 in the PBS Professional Administrator s Guide If your job requests grouping by a resource using place group resource the chunks are placed as requested and placement sets are ignored If your job requests grouping but no group contains the required number of vnodes grouping is ignored 4 7 2 How the Job Gets its Place Statement If the administrator has defined default values for arrangement sharing and grouping each job inherits these unless it explicitly requests at least one That means that if your job requests arrangement but not sharing or grouping it will not inherit values for sharing or grouping For example the administrator sets a default of pLlace pack exclhost group host Job A requests place free but doesn t specify sharing or grouping so Job A does not inherit sharing or grouping Job B does not request any placement so it inherits all three The place statement can be specified in order of precedence via 1 Explicit placement request in qalter Explicit placement request in qsub Explicit placement request in PBS job script direct
312. us P mpiprocs P Example l1nodes 4 ppn 2 is converted into lselect 4 ncpus 2 mpiprocs 2 If Incpus Z is specified and no spec contains ncpus X and no spec is cpp X Every chunk will include ncpus W where W is Z divided by the total number of chunks Note W must be an integer Z must be evenly divisible by the number of chunks If property is a suffix All chunks will include property true If excl is a suffix The placement directive will be Iplace scatter excl If shared is a suffix The placement directive will be Iplace scatter shared If neither excl nor shared is a suffix The placement directive will be lplace scatter Example lnodes 3 green ncpus 2 ppn 2 2 red is converted to l select 3 green true ncpus 4 mpiprocs 2 2 red true ncpus 1 l place scatter UG 88 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 4 8 3 3 Examples of Converting Old Syntax to New T Request CPUs and memory on a single host using old syntax l ncpus 5 mem 10gb is converted into the equivalent 1 select 1 ncpus 5 mem 10gb l place pack Request CPUs and memory on a named host along with custom resources including a floating license using old syntax 1l ncpus 1 mem 5mb host sunny opti 1 arch solaris is converted to the equivalent l select 1 ncpus 1 mem 5gb host sunny arch solaris l place pack l opti l Request one host with a certain property using old synt
313. use the vnode If a host is to be allocated exclusively to one job all of the host must be used if any vnode from a host has its sharing attribute set to either default_exclhost or force_exclhost all vnodes on that host must have the same value for the sharing attribute To see the value for a vnode s sharing attribute you can do either of the following e Use qmgr Qmgr list node lt vnode name gt sharing e Use pbsnodes pbsnodes av 4 7 1 3 Grouping on a Resource You can specify that all of the chunks for your job should run on vnodes that have the same value for a selected resource To group your job s chunks this way use the following format l place group lt resource gt where resource is a string or string array The value of the resource can be one or more strings at each vnode but there must be one string that is the same for each vnode For example if the resource is router the value can be r1i0 r1 at one vnode and r1i1 r1 at another vnode and these vnodes can be grouped because they share the string r1 Using the method of grouping on a resource you cannot specify what the value of the resource should be only that all vnodes have the same value If you need the resource to have a specific value specify that value in the description of the chunks UG 80 PBS Professional 13 0 Beta User s Guide Allocating Resources amp Placing Jobs Chapter 4 4 7 1 3 i Grouping vs Placement Sets Yo
314. ution Directories 3 2 2 1 Setting the Job s Staging and Execution Directory The job s sandbox attribute controls whether PBS creates a unique job specific staging and execution directory for this job If the job s sandbox attribute is set to PRIVATE PBS cre ates a unique staging and execution directory for the job If sandbox is unset or is set to HOME PBS uses your home directory as the job s staging and execution directory By default the sandbox attribute is not set UG 36 PBS Professional 13 0 Beta User s Guide Job Input amp Output Files Chapter 3 You can set the sandbox attribute via qsub or through a PBS directive For example qsub Wsandbox PRIVATE The job s sandbox attribute cannot be altered while the job is executing Table 3 2 Effect of Job s sandbox Attribute on Location of Staging and Execution Directory Job s sandbox attribute Erect not set Job s staging and execution directory is your home directory HOME Job s staging and execution directory is your home directory PRIVATE Job s staging and execution directory is a job specific directory created by PBS Ifthe qsub k option is used output and error files are retained on the primary execution host in the staging and execution direc tory This directory is removed along with all of its contents when the job finishes 3 2 2 2 The Job s jobdir Attribute and the PBS_JOBDIR Environment Variable The job s jo
315. vation does not appear in the output from pbs_rstat that means that the reservation was denied To ensure that you receive mail about your reservations set the reservation s Mail_Users attribute via the M lt email address gt option to pbs_rsub By default you will get email when the reservation is terminated or confirmed If you want to receive email about events other than those set the reservation s Mail_Points attribute via the m lt mail events gt option For more information see pbs _rsub on page 83 of the PBS Professional Reference Guide and Reservation Attributes on page 351 of the PBS Professional Reference Guide 7 3 4 Deleting Reservations You can delete an advance or standing reservation by using the pbs_ rdel command Fora standing reservation you can only delete the entire reservation including all occurrences When you delete a reservation all of the jobs that have been submitted to the reservation are also deleted A reservation can be deleted by its owner or by a PBS Operator or Manager For example to delete S304 south pbs_rdel S304 south PBS Professional 13 0 Beta User s Guide UG 179 Chapter 7 or Reserving Resources Ahead of Time pbs_rdel S304 7 4 Viewing the Status of a Reservation The following table shows the list of possible states for a reservation The states that you will usually see are CO UN BD and RN although a reservation usually remains unconfirmed for too sh
316. verName my job qsub q queueName serverName domain com my job 2 5 7 1 Using or Avoiding Dedicated Time Dedicated time is one or more specific time periods defined by the administrator These are not repeating time periods Each one is individually defined PBS Professional 13 0 Beta User s Guide UG 33 Chapter 2 Submitting a PBS Job During dedicated time the only jobs PBS starts are those in special dedicated time queues PBS schedules non dedicated jobs so that they will not run over into dedicated time Jobs in dedicated time queues are also scheduled so that they will not run over into non dedicated time PBS will attempt to backfill around the dedicated non dedicated time borders PBS uses walltime to schedule within and around dedicated time If a job is submitted with out a walltime to a non dedicated time queue it will not be started until all dedicated time periods are over Ifa job is submitted to a dedicated time queue without a walltime it will never run To submit a job to be run during dedicated time use the q lt queue name gt option to qsub and give the name of the dedicated time queue you wish to use as the queue name Queues are created by the administrator see your administrator for queue name s 2 5 8 Suppressing Printing Job Identifier to stdout To suppress printing the job identifier to standard output use the z option to qsub You can use it at the command line or in a PBS directive qsub z my job
317. ways Sent or Optional Job is aborted by PBS Optional Job begins execution Optional Job ends execution Optional Stagein fails Always All file stageout attempts fail Always Reservation is confirmed or denied Always PBS always sends you mail when your job or subjob is deleted For job arrays PBS sends one email per subjob You can restrict the number of job related emails PBS sends when you delete jobs or subjobs see section 2 5 1 3 Restricting Number of Job Deletion Emails on page 29 2 5 1 1 Specifying Job Lifecycle Email Points The set of points where PBS sends mail is specified in the Mail_Points job attribute When you set this option for a job array PBS sets the option for each subjob and sends mail for each subjob You can set the Mail_Points attribute using the following methods e The m lt mail points gt option to qsub e The PBS Mail_Points lt mail points gt PBS directive The mail points argument is a string which consists of either the single character n or one or more of the characters a b and e i Send mail when job or subjob is aborted by batch system b Send mail when job or subjob begins execution Example Begun execution e Send mail when job or subjob ends execution UG 28 PBS Professional 13 0 Beta User s Guide Submitting a PBS Job Chapter 2 Do not send mail Example 2 8 PBS sends mail when the job is aborted or ends qsub m ae my job PBS m ae 2
318. which causes PBS_NODEFILE to contain hostA hostB hostB and this is consistent with mpirun np 3 5 2 11 1 Options If executed inside a PBS job script all of the options to the PBS interface are the same as MPICH2 s mpirun except for the following host ghost For specifying the execution host to run on Ignored machinefile lt file gt The file argument contents are ignored and replaced by the contents of PBS_NODEFILE localonly lt x gt For specifying the lt x gt number of processes to run locally Not supported You are advised instead to use the equivalent arguments np lt x gt localonly UG 128 PBS Professional 13 0 Beta User s Guide Multiprocessor Jobs Chapter 5 np If you do not specify a np option then no default value is provided by the PBS interface to MPICH2 It is up to the standard mpirun to decide what the reasonable default value should be which is usually 1 The maximum number of ranks that can be launched is the number of entries in BS_NODEFILE 5 2 11 2 MPD Startup and Shutdown The interface ensures that the MPD daemons are started on each of the hosts listed in PBS_NODEFILE It also ensures that the MPD daemons are shut down at the end of MPI job execution PBS Professional 13 0 Beta User s Guide UG 129 Chapter 5 Multiprocessor Jobs 5 2 11 3 Examples Example 5 34 Run a single executable MPICH2 job with six processes spread out across the PBS allocated hosts l
319. whose name is a concatenation of PBScraylabel_ and the name of the label PBS sets the value of the resource to True on all vnodes representing the compute node Format PBScraylabel_ lt label name gt For example if the label name is Blue the name of this resource is PBScraylabel_Blue Format Boolean Default None PBScraynid Custom resource created by PBS for the Cray Used to track the node ID of the asso ciated compute node All vnodes representing a particular compute node share a value for PBScraynid Non consumable The value of PBScraynid is set to the value of node_id for this compute node Non consumable Format String Default None UG 254 PBS Professional 13 0 Beta User s Guide Submitting Cray Jobs Chapter 11 PBScrayorder Custom resource created by PBS for the Cray Used to track the order in which com pute nodes are listed in the Cray inventory All vnodes associated with a particular compute node share a value for PBScrayorder Non consumable Vnodes for the first compute node listed are assigned a value of 1 for PBScray order The vnodes for each subsequent compute node listed are assigned a value one greater than the previous value Do not use this resource in a resource request Format Integer Default None PBScrayseg Custom resource created by PBS for the Cray Tracks the segment ordinal of the associated NUMA node For the first NUMA node of a compute host the segment ordinal is 0 and the value
320. x 11 4 8 Specify Host for Interactive Jobs Interactive jobs on a Cray must run on a login node When you run an interactive job specify the login node as the host for the job You can do so using a PBS directive or the command line For example qsub 1 select host lt name of login node gt I job sh PBS l select host lt name of login node gt See section 6 11 Running Your Job Interactively on page 165 11 5 Techniques for Submitting Cray Jobs 11 5 1 Specifying Number of PEs per NUMA Node The Cray aprun S option allows you to specify the number of PEs per NUMA node for your job PBS allows you to make the equivalent request using select and place statements PBS jobs on the Cray should scatter chunks across vnodes To calculate the select and place requirements do the following e Set nchunk equal to n the width divided by S the number of PEs per NUMA node nchunk n S e Set ncpus equal to S the number PEs per NUMA node ncpus S e Set mpiprocs equal to S same as ncpus UG 262 PBS Professional 13 0 Beta User s Guide Submitting Cray Jobs Chapter 11 mpiprocs S Example 11 8 You want a total of 6 PEs with 2 PEs per NUMA node The aprun command is the following aprun S 2 n 6 myjob The equivalent select and place statements are qsub lselect 3 ncpus 2 mpiprocs 2 lplace vscatter Given two compute nodes each with two NUMA nodes where each NUMA node has four PEs two PEs from each of three of th
321. xample qsub l min_walltime lt min walltime gt max_walltime lt max walltime gt lt job script gt 6 3 3 2 Setting walltime for Shrink to fit Jobs For a shrink to fit job PBS sets the walltime resource based on the values of min_walltime and max_walltime regardless of whether walltime is specified for the job PBS examines each shrink to fit job when it gets to it and looks for a time slot whose length is between the job s min_walltime and max_walltime Ifthe job can fit somewhere PBS sets the job s walltime to a duration that fits the time slot and runs the job The chosen value for walltime is visible in the job s Resource_List walltime attribute Any existing walltime value regardless of where it comes from e g previous execution is reset to the new calcu lated running time If a shrink to fit job is run more than once PBS recalculates the job s running time to fit an available time slot that is between min_walltime and max_walltime and resets the job s walltime each time the job is run For a multi vnode job PBS chooses a walltime that works for all of the chunks required by the job and places job chunks according to the placement specification UG 150 PBS Professional 13 0 Beta User s Guide Controlling How Your Job Runs Chapter 6 6 3 4 Modifying Shrink to fit and Non shrink to fit Jobs 6 3 4 1 Modifying min_walltime and max_walltime You can change min_walltime and or max_walltime for a shrink to fi
322. ydata datal To stage more than one file or directory use a comma separated list of paths and enclose the list in double quotes For example to stage two files data1 and data2 in qsub W stagein input1 hostA myhome datal input2G hostA myhome datal PBS Professional 13 0 Beta User s Guide UG 39 Chapter 3 Job Input amp Output Files 3 2 5 Caveats and Requirements for Staging 3 2 5 1 Staging and Windows Paths 3 2 5 1 i Special Characters Under Windows if your path contains special characters such as spaces backslashes colons or drive letter specifications enclose the staging specification in double quotes For example to stage the grid dat file on drive D at hostB to the execution file named dat1 on drive C qsub W stagein dat1 hostB D Documents and Settings grid dat 3 2 5 1 ii Using UNC Paths If you use a UNC path to stage in or out the hostname is optional If you use a non UNC path the hostname is required 3 2 5 2 Path Names for Staging e Itis advisable to use an absolute pathname for the storage_path Remember that the path to your home directory may be different on each machine and that when using sandbox PRIVATE you may or may not have a home directory on all execution machines e Always use a relative pathname for execution_path when the job s staging and execution directory is created by PBS meaning when using a job specific staging and execution directory do not use an ab
323. you want qsub to block meaning wait for the job to complete and report the exit value of the job If your job is successfully submitted qsub blocks until the job terminates or an error occurs If job submission fails no special processing takes place If the job runs to completion qsub exits with the exit status of the job For job arrays block ing qsub waits until the entire job array is complete then returns the exit status of the job array The block job attribute controls blocking Set it either via qsub or a PBS directive qsub W block true PBS block true 6 8 1 Signal Handling and Error Processing for Blocking Jobs Signals SIGQUIT and SIGKILL are not trapped and immediately terminate the qsub pro cess leaving the associated job either running or queued If qsub receives one of the signals SIGHUP SIGINT or SIGTERM it prints a message and then exits with an exit status of 2 If the job is deleted before running to completion or an internal PBS error occurs qsub prints an an error message describing the situation to this error stream and qsub exits with an exit status of 3 6 8 2 Caveats for Blocking Jobs e Ifyou submit a job that is both blocking and interactive the job s exit status is not returned at the end of the job e PBS returns the exit status of a blocking job before staging finishes for the job To see whether the job is still staging use qstat f and look at the job s substate attribute
324. yping qsub myjob and then PBS returns the job ID 16387 exampleserver exampledomain Example 2 4 The following is the contents of the script named my job In it we name the job testjob and run a program called myprogram bin sh PBS N testjob myprogram Example 2 5 The simplest way to submit a job is to give the script name as the argument to qsub and hit return qsub lt job script gt lt ret gt If the script contains the following bin sh myapplication you have simply told PBS to run myapplication UG 20 PBS Professional 13 0 Beta User s Guide Submitting a PBS Job Chapter 2 2 3 3 4 Passing Arguments to Jobs If you need to pass arguments to a job script you can do the following Use environment variables in your script and pass values for the environment variables using v or V For example to use myinfile as the input to a out your job script contains the fol lowing PBS N myjobname a out lt SINFILE You can then use the V option qsub v INFILE tmp myinfile lt job script gt For example to use myinfile and mydata as the input to a out your job script con tains the following PBS N myjobname cat SINFILE SINDATA a out You can then use the V option qsub v INFILE tmp myinfile INDATA tmp mydata lt job script gt You can export the environment variable first export INFILE tmp myinfile qsub V lt job script gt Use a here document For exam
Download Pdf Manuals
Related Search
Related Contents
Gesamtdokument: Arbeitshilfe "Geodaten in der Praxis" Tipos de Operaciones - sistema de información MASILLA TAPAGRIETAS EXTERIOR Manual Técnico Installation Manual_CTLCPZ110KIT.indd JBL j50 User's Manual Xperia™ go Manual do utilizador Formal Report 製品組立・取扱説明書 View Official Rules - Global LEAP Awards Copyright © All rights reserved.
Failed to retrieve file