glob
The glob
module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell. If you need a list of filenames that all have a certain extension, prefix, or any common string in the middle, use glob
instead of writing code to scan the directory contents yourself.
Wildcards
*
: matches zero or more characters in a segment of a name?
: matches any single character in that position in the name[]
: matches characters in the given range[1-9]
: matches any single digit[a-z
: mathces any single letter
Functions
glob.glob()
glob(file_pattern, recursive = False)
It retrieves the list of files matching the specified pattern in the file_pattern
parameter.
The
file_pattern
can be an absolute or relative path. It may also contain wild cards such as*
or?
symbols.The
recursive
parameter is turn off (False
) by default. WhenTrue
, it recursively searches files under all subdirectories of the current directory.
For example, we have file structure like this:
- src
|- code
|- main.py
|- test0.py
|- test1.py
|- test2.py
|- test-env.py
|- test-prod.py
|- text
|- file_a.txt
|- file_b.txt
|- file_c.txt
|- file_d.txt
|- demo.py
We want to get all .py
files:
import glob
for py_file in glob.glob("src/**/*.py", recursive=True):
print(py_file)
src/demo.py
src/code/test1.py
src/code/test-env.py
src/code/main.py
src/code/test2.py
src/code/test0.py
We want to get test0.py
, test1.py
, test2.py
:
import glob
for py_file in glob.glob("src/**/test?.py"):
print(py_file)
src/code/test1.py
src/code/test2.py
src/code/test0.py
We want to get file_a.txt
, file_b.txt
, file_c.txt
:
import glob
for txt_file in glob.glob("src/**/file_[a-c].txt"):
print(txt_file)
src/text/file_a.txt
src/text/file_c.txt
src/text/file_b.txt
glob.iglob()
Return an iterator which yields the same values as glob()
without actually storing them all simultaneously (good for large directories).
Example:
import glob
for py_file in glob.iglob("src/**/test?.py"):
print(py_file)
src/code/test1.py
src/code/test2.py
src/code/test0.py
glob.escape()
Escape all special characters ('?'
, '*'
and '['
). “Escape” means treating special characters as normal character instead of wildcards.
This is useful if you want to match an arbitrary literal string that may have special characters in it.
For example, if we want to get test-env.py
and test-prod.py
(both contain special character -
in the filename):
import glob
for py_file in glob.glob("src/code/*" + glob.escape("-") + "*.py"):
print(py_file)
Another ways for filename pattern matching
fnmatch.fnmatch()
1
fnmatch.fnmatch(filename, pattern)
Test whether the filename
string matches the pattern
string.
For example, we want to get test0.py
, test1.py
, test2.py
:
import os
import fnmatch
for py_file in os.listdir("src/code"):
if fnmatch.fnmatch(py_file, "test?.py"):
print(py_file)
pathlib.Path.glob()
2
pathlib.Path.glob(pattern)
Glob the given relative pattern in the directory represented by this path, yielding all matching files (of any kind).
For example, if we want to list all python file:
from pathlib import Path
path = Path("src")
for py_file in path.glob("**/*.py"):
print(py_file)
src/demo.py
src/code/test1.py
src/code/test-prod.py
src/code/test-env.py
src/code/main.py
src/code/test2.py
src/code/test0.py
**
” pattern means “this directory and all subdirectories, recursively”. In other words, it enables recursive globbing. However, using the “**
” pattern in large directory trees may consume an inordinate amount of time.Path.glob()
is similar to os.glob()
discussed above. As you can see, pathlib
combines many of the best features of the os
, os.path
, and glob
modules into one single module, which makes it a joy to use.
pathlib.Path.rglob()
3
This is like calling Path.glob()
with “**/
” added in front of the given relative pattern.
E.g. list all python file:
for py_file in Path("src").rglob("*.py"):
print(py_file)
src/demo.py
src/code/test1.py
src/code/test-prod.py
src/code/test-env.py
src/code/main.py
src/code/test2.py
src/code/test0.py
Summary
Function | Description |
---|---|
fnmatch.fnmatch(filename, pattern) | Tests whether the filename matches the pattern and returns True or False |
glob.glob() | Returns a list of filenames that match a pattern |
glob.iglob() | Returns an iterator of filenames that match a pattern |
pathlib.Path.glob() | Finds patterns in path names and returns a generator object |
pathlib.Path.rglob() | Finds patterns in path names recursively and returns a generator object |