Detect file handle leaks in python?

My program appears to be leaking file handles. How can I find out where?

My program uses file handles in a few different places—output from child processes, call ctypes API (ImageMagick) opens files, and they are copied.

It crashes in shutil.copyfile, but I’m pretty sure this is not the place it is leaking.

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:Python25Libsite-packagesmagpymagpy.py", line 874, in main
    magpy.run_all()
  File "C:Python25Libsite-packagesmagpymagpy.py", line 656, in run_all
    [operation.operate() for operation in operations]
  File "C:Python25Libsite-packagesmagpymagpy.py", line 417, in operate
    output_file = self.place_image(output_file)
  File "C:Python25Libsite-packagesmagpymagpy.py", line 336, in place_image
    shutil.copyfile(str(input_file), str(self.full_filename))
  File "C:Python25Libshutil.py", line 47, in copyfile
    fdst = open(dst, 'wb')
IOError: [Errno 24] Too many open files: 'C:\Documents and Settings\stuart.axon\Desktop\calzone\output\wwtbam4\Nokia_NCD\nl\icon_42x42_V000.png'
Press any key to continue . . .

I had similar problems, running out of file descriptors during subprocess.Popen() calls. I used the following script to debug on what is happening:

import os
import stat

_fd_types = (
    ('REG', stat.S_ISREG),
    ('FIFO', stat.S_ISFIFO),
    ('DIR', stat.S_ISDIR),
    ('CHR', stat.S_ISCHR),
    ('BLK', stat.S_ISBLK),
    ('LNK', stat.S_ISLNK),
    ('SOCK', stat.S_ISSOCK)
)

def fd_table_status():
    result = []
    for fd in range(100):
        try:
            s = os.fstat(fd)
        except:
            continue
        for fd_type, func in _fd_types:
            if func(s.st_mode):
                break
        else:
            fd_type = str(s.st_mode)
        result.append((fd, fd_type))
    return result

def fd_table_status_logify(fd_table_result):
    return ('Open file handles: ' +
            ', '.join(['{0}: {1}'.format(*i) for i in fd_table_result]))

def fd_table_status_str():
    return fd_table_status_logify(fd_table_status())

if __name__=='__main__':
    print fd_table_status_str()

You can import this module and call fd_table_status_str() to log the file descriptor table status at different points in your code.

Also, make sure that subprocess.Popen instances are destroyed. Keeping references of Popen instances in Windows prevent the GC from running. And if the instances are kept, the associated pipes are not closed. More info here.

Use Process Explorer, select your process, View->Lower Pane View->Handles – then look for what seems out of place – usually lots of the same or similar files open points to the problem.

lsof -p <process_id> works well on several UNIX-like systems including FreeBSD.

Look at the output from ls -l /proc/$pid/fd/ (substituting the PID of your process, of course) to see which files are open [or, on win32, use Process Explorer to list open files]; then figure out where in your code you’re opening them, and make sure that close() is being called. (Yes, the garbage collector will eventually close things, but it’s not always fast enough to avoid running out of fds).

Checking for any circular references which might be preventing garbage collection is also a good practice. (The cycle collector will eventually dispose of these — but it may not run frequently enough to avoid file descriptor exhaustion; I’ve been bitten by this personally).

While the OP has a Windows system, I’m sure plenty of people here (such as myself) are looking for others too (it’s not even tagged Windows).

Google has a psutil package with a get_open_files() method. It looks like an excellent interface, but it hasn’t been maintained in a couple years it seems. I actually wrote an implementation for my own Python 2 project on Linux. I’m using it with unittest to make sure my functions clean up their resources.

import os

# calling this **synchronously** will accurately relay open files on Linux
def get_open_files(pid):
    # directory spawned by Python process, containing its file descriptors
    path = "/proc/%d/fd" % pid
    # list the abspaths belonging to that directory
    links = ["%s/%s" % (path, f) for f in os.listdir(path)]
    # filter out the bad ones returned by os.listdir()
    valid_links = filter(lambda f: os.path.exists(f), links)
    # these links are fd integers, so map them to their actual file devices
    devices = map(lambda f: os.readlink(f), valid_links)
    # remove any ones that are stdin, stdout, stderr, etc.
    return filter(lambda f: "/dev/pts" not in f, devices)

Python’s own test suite has a refleak module that utilizes fd_count. Works across operating systems and is available on full installs:

>>> from test.support.os_helper import fd_count
>>> fd_count()
27

On Python 3.9 and earlier, the os_helper doesn’t exist, so from test.support import fd_count.

Leave a Comment