Bash: limit the number of concurrent jobs?

[*]

Is there an easy way to limit the number of concurrent jobs in bash? By that I mean making the & block when there are more then n concurrent jobs running in the background.

I know I can implement this with ps | grep -style tricks, but is there an easier way?

[*]

If you have GNU Parallel http://www.gnu.org/software/parallel/ installed you can do this:

parallel gzip ::: *.log

which will run one gzip per CPU core until all logfiles are gzipped.

If it is part of a larger loop you can use sem instead:

for i in *.log ; do
    echo $i Do more stuff here
    sem -j+0 gzip $i ";" echo done
done
sem --wait

It will do the same, but give you a chance to do more stuff for each file.

If GNU Parallel is not packaged for your distribution you can install GNU Parallel simply by:

$ (wget -O - pi.dk/3 || lynx -source pi.dk/3 || curl pi.dk/3/ || 
   fetch -o - http://pi.dk/3 ) > install.sh
$ sha1sum install.sh | grep 883c667e01eed62f975ad28b6d50e22a
12345678 883c667e 01eed62f 975ad28b 6d50e22a
$ md5sum install.sh | grep cc21b4c943fd03e93ae1ae49e28573c0
cc21b4c9 43fd03e9 3ae1ae49 e28573c0
$ sha512sum install.sh | grep da012ec113b49a54e705f86d51e784ebced224fdf
79945d9d 250b42a4 2067bb00 99da012e c113b49a 54e705f8 6d51e784 ebced224
fdff3f52 ca588d64 e75f6033 61bd543f d631f592 2f87ceb2 ab034149 6df84a35
$ bash install.sh

It will download, check signature, and do a personal installation if it cannot install globally.

Watch the intro videos for GNU Parallel to learn more:
https://www.youtube.com/playlist?list=PL284C9FF2488BC6D1

[*]

A small bash script could help you:

# content of script exec-async.sh
joblist=($(jobs -p))
while (( ${#joblist[*]} >= 3 ))
do
    sleep 1
    joblist=($(jobs -p))
done
$* &

If you call:

. exec-async.sh sleep 10

…four times, the first three calls will return immediately, the fourth call will block until there are less than three jobs running.

You need to start this script inside the current session by prefixing it with ., because jobs lists only the jobs of the current session.

The sleep inside is ugly, but I didn’t find a way to wait for the first job that terminates.

[*]

The following script shows a way to do this with functions. You can either put the bgxupdate() and bgxlimit() functions in your script, or have them in a separate file which is sourced from your script with:

. /path/to/bgx.sh

It has the advantage that you can maintain multiple groups of processes independently (you can run, for example, one group with a limit of 10 and another totally separate group with a limit of 3).

It uses the Bash built-in jobs to get a list of sub-processes but maintains them in individual variables. In the loop at the bottom, you can see how to call the bgxlimit() function:

  1. Set up an empty group variable.
  2. Transfer that to bgxgrp.
  3. Call bgxlimit() with the limit and command you want to run.
  4. Transfer the new group back to your group variable.

Of course, if you only have one group, just use bgxgrp variable directly rather than transferring in and out.

#!/bin/bash

# bgxupdate - update active processes in a group.
#   Works by transferring each process to new group
#   if it is still active.
# in:  bgxgrp - current group of processes.
# out: bgxgrp - new group of processes.
# out: bgxcount - number of processes in new group.

bgxupdate() {
    bgxoldgrp=${bgxgrp}
    bgxgrp=""
    ((bgxcount = 0))
    bgxjobs=" $(jobs -pr | tr 'n' ' ')"
    for bgxpid in ${bgxoldgrp} ; do
        echo "${bgxjobs}" | grep " ${bgxpid} " >/dev/null 2>&1
        if [[ $? -eq 0 ]]; then
            bgxgrp="${bgxgrp} ${bgxpid}"
            ((bgxcount++))
        fi
    done
}

# bgxlimit - start a sub-process with a limit.

#   Loops, calling bgxupdate until there is a free
#   slot to run another sub-process. Then runs it
#   an updates the process group.
# in:  $1     - the limit on processes.
# in:  $2+    - the command to run for new process.
# in:  bgxgrp - the current group of processes.
# out: bgxgrp - new group of processes

bgxlimit() {
    bgxmax=$1; shift
    bgxupdate
    while [[ ${bgxcount} -ge ${bgxmax} ]]; do
        sleep 1
        bgxupdate
    done
    if [[ "$1" != "-" ]]; then
        $* &
        bgxgrp="${bgxgrp} $!"
    fi
}

# Test program, create group and run 6 sleeps with
#   limit of 3.

group1=""
echo 0 $(date | awk '{print $4}') '[' ${group1} ']'
echo
for i in 1 2 3 4 5 6; do
    bgxgrp=${group1}; bgxlimit 3 sleep ${i}0; group1=${bgxgrp}
    echo ${i} $(date | awk '{print $4}') '[' ${group1} ']'
done

# Wait until all others are finished.

echo
bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
while [[ ${bgxcount} -ne 0 ]]; do
    oldcount=${bgxcount}
    while [[ ${oldcount} -eq ${bgxcount} ]]; do
        sleep 1
        bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
    done
    echo 9 $(date | awk '{print $4}') '[' ${group1} ']'
done

Here’s a sample run, with blank lines inserted to clearly delineate different time points:

0 12:38:00 [ ]
1 12:38:00 [ 3368 ]
2 12:38:00 [ 3368 5880 ]
3 12:38:00 [ 3368 5880 2524 ]

4 12:38:10 [ 5880 2524 1560 ]

5 12:38:20 [ 2524 1560 5032 ]

6 12:38:30 [ 1560 5032 5212 ]

9 12:38:50 [ 5032 5212 ]

9 12:39:10 [ 5212 ]

9 12:39:30 [ ]

Or, if you prefer it in a more graphical time-line form:

Process:  1  2  3  4  5  6 
--------  -  -  -  -  -  -
12:38:00  ^  ^  ^            1/2/3 start together.
12:38:10  v  |  |  ^         4 starts when 1 done.
12:38:20     v  |  |  ^      5 starts when 2 done.
12:38:30        v  |  |  ^   6 starts when 3 done.
12:38:40           |  |  |
12:38:50           v  |  |   4 ends.
12:39:00              |  |
12:39:10              v  |   5 ends.
12:39:20                 |
12:39:30                 v   6 ends.

[*]

Here’s the shortest way:

waitforjobs() {
    while test $(jobs -p | wc -w) -ge "$1"; do wait -n; done
}

Call this function before forking off any new job:

waitforjobs 10
run_another_job &

To have as many background jobs as cores on the machine, use $(nproc) instead of a fixed number like 10.

[*]

Assuming you’d like to write code like this:

for x in $(seq 1 100); do     # 100 things we want to put into the background.
    max_bg_procs 5            # Define the limit. See below.
    your_intensive_job &
done

Where max_bg_procs should be put in your .bashrc:

function max_bg_procs {
    if [[ $# -eq 0 ]] ; then
            echo "Usage: max_bg_procs NUM_PROCS.  Will wait until the number of background (&)"
            echo "           bash processes (as determined by 'jobs -pr') falls below NUM_PROCS"
            return
    fi
    local max_number=$((0 + ${1:-0}))
    while true; do
            local current_number=$(jobs -pr | wc -l)
            if [[ $current_number -lt $max_number ]]; then
                    break
            fi
            sleep 1
    done
}

[*]

The following function (developed from tangens answer above, either copy into script or source from file):

job_limit () {
    # Test for single positive integer input
    if (( $# == 1 )) && [[ $1 =~ ^[1-9][0-9]*$ ]]
    then

        # Check number of running jobs
        joblist=($(jobs -rp))
        while (( ${#joblist[*]} >= $1 ))
        do

            # Wait for any job to finish
            command='wait '${joblist[0]}
            for job in ${joblist[@]:1}
            do
                command+=' || wait '$job
            done
            eval $command
            joblist=($(jobs -rp))
        done
   fi
}

1) Only requires inserting a single line to limit an existing loop

while :
do
    task &
    job_limit `nproc`
done

2) Waits on completion of existing background tasks rather than polling, increasing efficiency for fast tasks

[*]

If you’re willing to do this outside of pure bash, you should look into a job queuing system.

For instance, there’s GNU queue or PBS. And for PBS, you might want to look into Maui for configuration.

Both systems will require some configuration, but it’s entirely possible to allow a specific number of jobs to run at once, only starting newly queued jobs when a running job finishes. Typically, these job queuing systems would be used on supercomputing clusters, where you would want to allocate a specific amount of memory or computing time to any given batch job; however, there’s no reason you can’t use one of these on a single desktop computer without regard for compute time or memory limits.

[*]

This might be good enough for most purposes, but is not optimal.

#!/bin/bash

n=0
maxjobs=10

for i in *.m4a ; do
    # ( DO SOMETHING ) &

    # limit jobs
    if (( $(($((++n)) % $maxjobs)) == 0 )) ; then
        wait # wait until all have finished (not optimal, but most times good enough)
        echo $n wait
    fi
done

[*]

It is hard to do without wait -n (for example, shell in busybox does not support it). So here is a workaround, it is not optimal because it calls ‘jobs’ and ‘wc’ commands 10x per second. You can reduce the calls to 1x per second for example, if you don’t mind waiting a bit longer for each job to complete.

# $1 = maximum concurent jobs
#
limit_jobs()
{
   while true; do
      if [ "$(jobs -p | wc -l)" -lt "$1" ]; then break; fi
      usleep 100000
   done
}

# and now start some tasks:

task &
limit_jobs 2
task &
limit_jobs 2
task &
limit_jobs 2
task &
limit_jobs 2
wait

[*]

On Linux I use this to limit the bash jobs to the number of available CPUs (possibly overriden by setting the CPU_NUMBER).

[ "$CPU_NUMBER" ] || CPU_NUMBER="`nproc 2>/dev/null || echo 1`"

while [ "$1" ]; do
    {
        do something
        with $1
        in parallel

        echo "[$# items left] $1 done"
    } &

    while true; do
        # load the PIDs of all child processes to the array
        joblist=(`jobs -p`)
        if [ ${#joblist[*]} -ge "$CPU_NUMBER" ]; then
            # when the job limit is reached, wait for *single* job to finish
            wait -n
        else
            # stop checking when we're below the limit
            break
        fi
    done
    # it's great we executed zero external commands to check!

    shift
done

# wait for all currently active child processes
wait

[*]

Have you considered starting ten long-running listener processes and communicating with them via named pipes?

[*]

you can use ulimit -u
see http://ss64.com/bash/ulimit.html

[*]

Bash mostly processes files line by line.
So you cap split input file input files by N lines then simple pattern is applicable:

mkdir tmp ; pushd tmp ; split -l 50 ../mainfile.txt
for file in * ; do 
   while read a b c ; do curl -s http://$a/$b/$c <$file &
   done ; wait ; done
popd ; rm -rf tmp;

[*]

Wait command, -n option, waits for the next job to terminate.

maxjobs=10
# wait for the amount of processes less to $maxjobs
jobIds=($(jobs -p))
len=${#jobIds[@]}
while [ $len -ge $maxjobs ]; do
    # Wait until one job is finished
    wait -n $jobIds
    jobIds=($(jobs -p))
    len=${#jobIds[@]}
done

Leave a Comment