glob

The glob module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell. If you need a list of filenames that all have a certain extension, prefix, or any common string in the middle, use glob instead of writing code to scan the directory contents yourself.

Wildcards

  • * : matches zero or more characters in a segment of a name
  • ? : matches any single character in that position in the name
  • [] : matches characters in the given range
    • [1-9] : matches any single digit
    • [a-z : mathces any single letter

Functions

glob.glob()

glob(file_pattern, recursive = False)

It retrieves the list of files matching the specified pattern in the file_pattern parameter.

  • The file_pattern can be an absolute or relative path. It may also contain wild cards such as * or ? symbols.

  • The recursive parameter is turn off (False) by default. When True, it recursively searches files under all subdirectories of the current directory.

For example, we have file structure like this:

- src
|- code
    |- main.py
    |- test0.py
    |- test1.py
    |- test2.py
    |- test-env.py
    |- test-prod.py
|- text
    |- file_a.txt
    |- file_b.txt
    |- file_c.txt
    |- file_d.txt
|- demo.py

We want to get all .py files:

import glob

for py_file in glob.glob("src/**/*.py", recursive=True):
    print(py_file) 
src/demo.py
src/code/test1.py
src/code/test-env.py
src/code/main.py
src/code/test2.py
src/code/test0.py

We want to get test0.py, test1.py, test2.py:

import glob

for py_file in glob.glob("src/**/test?.py"):
    print(py_file)
src/code/test1.py
src/code/test2.py
src/code/test0.py

We want to get file_a.txt, file_b.txt, file_c.txt:

import glob

for txt_file in glob.glob("src/**/file_[a-c].txt"):
    print(txt_file)
src/text/file_a.txt
src/text/file_c.txt
src/text/file_b.txt

glob.iglob()

Return an iterator which yields the same values as glob() without actually storing them all simultaneously (good for large directories).

Example:

import glob

for py_file in glob.iglob("src/**/test?.py"):
    print(py_file)
src/code/test1.py
src/code/test2.py
src/code/test0.py

glob.escape()

Escape all special characters ('?', '*' and '['). “Escape” means treating special characters as normal character instead of wildcards.

This is useful if you want to match an arbitrary literal string that may have special characters in it.

For example, if we want to get test-env.py and test-prod.py (both contain special character - in the filename):

import glob

for py_file in glob.glob("src/code/*" + glob.escape("-") + "*.py"):
    print(py_file)

Another ways for filename pattern matching

fnmatch.fnmatch() 1

fnmatch.fnmatch(filename, pattern)

Test whether the filename string matches the pattern string.

For example, we want to get test0.py, test1.py, test2.py:

import os
import fnmatch

for py_file in os.listdir("src/code"):
    if fnmatch.fnmatch(py_file, "test?.py"):
        print(py_file)

pathlib.Path.glob() 2

pathlib.Path.glob(pattern)

Glob the given relative pattern in the directory represented by this path, yielding all matching files (of any kind).

For example, if we want to list all python file:

from pathlib import Path

path = Path("src")

for py_file in path.glob("**/*.py"):
    print(py_file)
src/demo.py
src/code/test1.py
src/code/test-prod.py
src/code/test-env.py
src/code/main.py
src/code/test2.py
src/code/test0.py
The “**” pattern means “this directory and all subdirectories, recursively”. In other words, it enables recursive globbing. However, using the “**” pattern in large directory trees may consume an inordinate amount of time.

Path.glob() is similar to os.glob() discussed above. As you can see, pathlib combines many of the best features of the os, os.path, and glob modules into one single module, which makes it a joy to use.

pathlib.Path.rglob() 3

This is like calling Path.glob() with “**/” added in front of the given relative pattern.

E.g. list all python file:

for py_file in Path("src").rglob("*.py"):
    print(py_file)
src/demo.py
src/code/test1.py
src/code/test-prod.py
src/code/test-env.py
src/code/main.py
src/code/test2.py
src/code/test0.py

Summary

FunctionDescription
fnmatch.fnmatch(filename, pattern)Tests whether the filename matches the pattern and returns True or False
glob.glob()Returns a list of filenames that match a pattern
glob.iglob()Returns an iterator of filenames that match a pattern
pathlib.Path.glob()Finds patterns in path names and returns a generator object
pathlib.Path.rglob()Finds patterns in path names recursively and returns a generator object

Reference