Is there an easy way to limit the number of concurrent jobs in bash? By that I mean making the & block when there are more then n concurrent jobs running in the background.
I know I can implement this with ps | grep -style tricks, but is there an easier way?
# content of script exec-async.sh
joblist=($(jobs -p))
while (( ${#joblist[*]} >= 3 ))
do
sleep 1
joblist=($(jobs -p))
done
$* &
If you call:
. exec-async.sh sleep 10
…four times, the first three calls will return immediately, the fourth call will block until there are less than three jobs running.
You need to start this script inside the current session by prefixing it with ., because jobs lists only the jobs of the current session.
The sleep inside is ugly, but I didn’t find a way to wait for the first job that terminates.
[*]
The following script shows a way to do this with functions. You can either put the bgxupdate() and bgxlimit() functions in your script, or have them in a separate file which is sourced from your script with:
. /path/to/bgx.sh
It has the advantage that you can maintain multiple groups of processes independently (you can run, for example, one group with a limit of 10 and another totally separate group with a limit of 3).
It uses the Bash built-in jobs to get a list of sub-processes but maintains them in individual variables. In the loop at the bottom, you can see how to call the bgxlimit() function:
Set up an empty group variable.
Transfer that to bgxgrp.
Call bgxlimit() with the limit and command you want to run.
Transfer the new group back to your group variable.
Of course, if you only have one group, just use bgxgrp variable directly rather than transferring in and out.
#!/bin/bash
# bgxupdate - update active processes in a group.
# Works by transferring each process to new group
# if it is still active.
# in: bgxgrp - current group of processes.
# out: bgxgrp - new group of processes.
# out: bgxcount - number of processes in new group.
bgxupdate() {
bgxoldgrp=${bgxgrp}
bgxgrp=""
((bgxcount = 0))
bgxjobs=" $(jobs -pr | tr 'n' ' ')"
for bgxpid in ${bgxoldgrp} ; do
echo "${bgxjobs}" | grep " ${bgxpid} " >/dev/null 2>&1
if [[ $? -eq 0 ]]; then
bgxgrp="${bgxgrp} ${bgxpid}"
((bgxcount++))
fi
done
}
# bgxlimit - start a sub-process with a limit.
# Loops, calling bgxupdate until there is a free
# slot to run another sub-process. Then runs it
# an updates the process group.
# in: $1 - the limit on processes.
# in: $2+ - the command to run for new process.
# in: bgxgrp - the current group of processes.
# out: bgxgrp - new group of processes
bgxlimit() {
bgxmax=$1; shift
bgxupdate
while [[ ${bgxcount} -ge ${bgxmax} ]]; do
sleep 1
bgxupdate
done
if [[ "$1" != "-" ]]; then
$* &
bgxgrp="${bgxgrp} $!"
fi
}
# Test program, create group and run 6 sleeps with
# limit of 3.
group1=""
echo 0 $(date | awk '{print $4}') '[' ${group1} ']'
echo
for i in 1 2 3 4 5 6; do
bgxgrp=${group1}; bgxlimit 3 sleep ${i}0; group1=${bgxgrp}
echo ${i} $(date | awk '{print $4}') '[' ${group1} ']'
done
# Wait until all others are finished.
echo
bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
while [[ ${bgxcount} -ne 0 ]]; do
oldcount=${bgxcount}
while [[ ${oldcount} -eq ${bgxcount} ]]; do
sleep 1
bgxgrp=${group1}; bgxupdate; group1=${bgxgrp}
done
echo 9 $(date | awk '{print $4}') '[' ${group1} ']'
done
Here’s a sample run, with blank lines inserted to clearly delineate different time points:
The whole thing starts at 12:38:00 (time t = 0) and, as you can see, the first three processes run immediately.
Each process sleeps for 10n seconds and the fourth process doesn’t start until the first exits (at time t = 10). You can see that process 3368 has disappeared from the list before 1560 is added.
Similarly, the fifth process 5032 starts when 5880 (the second) exits at time t = 20.
And finally, the sixth process 5212 starts when 2524 (the third) exits at time t = 30.
Then the rundown begins, the fourth process exits at time t = 50 (started at 10 with 40 duration).
The fifth exits at time t = 70 (started at 20 with 50 duration).
Finally, the sixth exits at time t = 90 (started at 30 with 60 duration).
Or, if you prefer it in a more graphical time-line form:
Process: 1 2 3 4 5 6
-------- - - - - - -
12:38:00 ^ ^ ^ 1/2/3 start together.
12:38:10 v | | ^ 4 starts when 1 done.
12:38:20 v | | ^ 5 starts when 2 done.
12:38:30 v | | ^ 6 starts when 3 done.
12:38:40 | | |
12:38:50 v | | 4 ends.
12:39:00 | |
12:39:10 v | 5 ends.
12:39:20 |
12:39:30 v 6 ends.
[*]
Here’s the shortest way:
waitforjobs() {
while test $(jobs -p | wc -w) -ge "$1"; do wait -n; done
}
Call this function before forking off any new job:
waitforjobs 10
run_another_job &
To have as many background jobs as cores on the machine, use $(nproc) instead of a fixed number like 10.
[*]
Assuming you’d like to write code like this:
for x in $(seq 1 100); do # 100 things we want to put into the background.
max_bg_procs 5 # Define the limit. See below.
your_intensive_job &
done
Where max_bg_procs should be put in your .bashrc:
function max_bg_procs {
if [[ $# -eq 0 ]] ; then
echo "Usage: max_bg_procs NUM_PROCS. Will wait until the number of background (&)"
echo " bash processes (as determined by 'jobs -pr') falls below NUM_PROCS"
return
fi
local max_number=$((0 + ${1:-0}))
while true; do
local current_number=$(jobs -pr | wc -l)
if [[ $current_number -lt $max_number ]]; then
break
fi
sleep 1
done
}
[*]
The following function (developed from tangens answer above, either copy into script or source from file):
job_limit () {
# Test for single positive integer input
if (( $# == 1 )) && [[ $1 =~ ^[1-9][0-9]*$ ]]
then
# Check number of running jobs
joblist=($(jobs -rp))
while (( ${#joblist[*]} >= $1 ))
do
# Wait for any job to finish
command='wait '${joblist[0]}
for job in ${joblist[@]:1}
do
command+=' || wait '$job
done
eval $command
joblist=($(jobs -rp))
done
fi
}
1) Only requires inserting a single line to limit an existing loop
while :
do
task &
job_limit `nproc`
done
2) Waits on completion of existing background tasks rather than polling, increasing efficiency for fast tasks
[*]
If you’re willing to do this outside of pure bash, you should look into a job queuing system.
For instance, there’s GNU queue or PBS. And for PBS, you might want to look into Maui for configuration.
Both systems will require some configuration, but it’s entirely possible to allow a specific number of jobs to run at once, only starting newly queued jobs when a running job finishes. Typically, these job queuing systems would be used on supercomputing clusters, where you would want to allocate a specific amount of memory or computing time to any given batch job; however, there’s no reason you can’t use one of these on a single desktop computer without regard for compute time or memory limits.
[*]
This might be good enough for most purposes, but is not optimal.
#!/bin/bash
n=0
maxjobs=10
for i in *.m4a ; do
# ( DO SOMETHING ) &
# limit jobs
if (( $(($((++n)) % $maxjobs)) == 0 )) ; then
wait # wait until all have finished (not optimal, but most times good enough)
echo $n wait
fi
done
[*]
It is hard to do without wait -n (for example, shell in busybox does not support it). So here is a workaround, it is not optimal because it calls ‘jobs’ and ‘wc’ commands 10x per second. You can reduce the calls to 1x per second for example, if you don’t mind waiting a bit longer for each job to complete.
# $1 = maximum concurent jobs
#
limit_jobs()
{
while true; do
if [ "$(jobs -p | wc -l)" -lt "$1" ]; then break; fi
usleep 100000
done
}
# and now start some tasks:
task &
limit_jobs 2
task &
limit_jobs 2
task &
limit_jobs 2
task &
limit_jobs 2
wait
[*]
On Linux I use this to limit the bash jobs to the number of available CPUs (possibly overriden by setting the CPU_NUMBER).
[ "$CPU_NUMBER" ] || CPU_NUMBER="`nproc 2>/dev/null || echo 1`"
while [ "$1" ]; do
{
do something
with $1
in parallel
echo "[$# items left] $1 done"
} &
while true; do
# load the PIDs of all child processes to the array
joblist=(`jobs -p`)
if [ ${#joblist[*]} -ge "$CPU_NUMBER" ]; then
# when the job limit is reached, wait for *single* job to finish
wait -n
else
# stop checking when we're below the limit
break
fi
done
# it's great we executed zero external commands to check!
shift
done
# wait for all currently active child processes
wait
[*]
Have you considered starting ten long-running listener processes and communicating with them via named pipes?
Bash mostly processes files line by line.
So you cap split input file input files by N lines then simple pattern is applicable:
mkdir tmp ; pushd tmp ; split -l 50 ../mainfile.txt
for file in * ; do
while read a b c ; do curl -s http://$a/$b/$c <$file &
done ; wait ; done
popd ; rm -rf tmp;
[*]
Wait command, -n option, waits for the next job to terminate.
maxjobs=10
# wait for the amount of processes less to $maxjobs
jobIds=($(jobs -p))
len=${#jobIds[@]}
while [ $len -ge $maxjobs ]; do
# Wait until one job is finished
wait -n $jobIds
jobIds=($(jobs -p))
len=${#jobIds[@]}
done