Working with Files

Working with Files

with open(...) as ... pattern

  1. open() opens file for reading or writing and returns a file handle
  2. Use appropriate methods to read or write this file handel

Example

  • Read

    with open('data.txt', 'r') as f:
        data = f.read()
    
  • Write

    with open('data.txt', 'w') as f:
        data = 'some data to be written to the file'
        f.write(data)
    

Directory listing

Directory Listing in Legacy Python Versions

os.listdir()

  • returns a Python list containing the names of the files and subdirectories in the directory given by the path argument:

Example

import os

entries = os.listdir('my_directory/')

Directory Listing in Modern Python Versions

os.scandir()

  • returns an iterator

    ROOT = 'my_directory/'
    entries = os.scandir(ROOT)
    entries
    
    <posix.ScandirIterator at 0x7f6db713c3f0>
    
  • The ScandirIterator points to all the entries in the current directory. You can loop over the contents of the iterator and print out the filenames:

    with os.scandir(ROOT) as entries:
        for entry in entries:
            print(f'{entry.name:10}: {entry.path}')
    

pathlib module

  • pathlib.Path() objects have an .iterdir() method for creating an iterator of all files and folders in a directory. Each entry yielded by .iterdir() contains information about the file or directory such as its name and file attributes.

  • pathlib offers a set of classes featuring most of the common operations on paths in an easy, object-oriented way. Another benefit of using pathlib over os is that it reduces the number of imports you need to make to manipulate filesystem paths. 👏

Example

from pathlib import Path

entries = Path(ROOT)
for entry in entries.iterdir():
    print(f'{entry.name}, name: {entry.stem}, ext: {entry.suffix}')

Summary

FunctionDescription
os.listdir()Returns a list of all files and folders in a directory
os.scandir()Returns an iterator of all the objects in a directory including file attribute information
pathlib.Path.iterdir()Returns an iterator of all the objects in a directory including file attribute information

Listing all files in a directory

Filter out directories and only list files from a directory listing

Use os.listdir()

import os

basepath = 'my_directory'
for entry in os.listdir(basepath):
    if os.path.isfile(os.path.join(basepath, entry)):
        print(entry)

Use os.scandir()

Using os.scandir() has the advantage of looking cleaner and being easier to understand than using os.listdir()

import os

basepath = 'my_directory'
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_file():
            print(entry.name)

Use pathlib.Path()

from pathlib import Path

basepath = 'my_directory'
files_in_basepath = Path(basepath).iterdir()
for item in files_in_basepath:
    if item.is_file():
        print(item.name)

The code above can be made more concise if we combine the for loop and the if statement into a single generator expression

from pathlib import Path

basepath = 'my_directory'
files_in_basepath = (entry for entry in Path(basepath).iterdir() if entry.is_file())
for item in files_in_basepath:
    print(item.name)

Listing subdirectories

Use os.listdir() and

import os

basepath = 'my_directory'
sub_dirs = (entry for entry in os.listdir(basepath) 
            if os.path.isdir(os.path.join(basepath, entry)))

for sub_dir in sub_dirs:
    print(sub_dir)

Use os.scandir()

import os

basepath = 'my_directory'
with os.scandir(basepath) as entries:
    for entry in entries:
        if entry.is_dir():
            print(entry.name)

Use pathlib.Path

from pathlib import Path

basepath = 'my_directory'
sub_dirs = (item for item in Path(basepath).iterdir() if item.is_dir())
for sub_dir in sub_dirs:
    print(sub_dir.name)

Making directories

os and pathlib include functions for creating directories. We’ll consider these:

FunctionDescription
os.mkdir()Creates a single subdirectory
pathlib.Path.mkdir()Creates single or multiple directories
os.makedirs()Creates multiple directories, including intermediate directories

Making single directories

Use os.mkdir()

To create a single directory, pass a path to the directory as a parameter to os.mkdir():

import os

os.mkdir('example_directory/')

Note: If a directory already exists, os.mkdir() raises FileExistsError.

Use pathlib

from pathlib import Path

p = Path('example_directory/')
p.mkdir()

If the path already exists, Path.mkdir() raises a FileExistsError.

To avoid the error, catch the error when it happens and let the user know:

from pathlib import Path

p = Path('example_directory/')
try:    
    p.mkdir()
except FileExistsError as err:
    print(err)

Alternatively, you can ignore the FileExistsError by passing the exist_ok=True argument to .mkdir(). This will not raise an error if the directory already exists.

from pathlib import Path

p = Path('example_directory/')
p.mkdir(exist_ok=True)

Creating Multiple Directories

Use os.makedirs

os.makedirs()

  • similar to os.mkdir()

  • The difference between the two is that not only can os.makedirs() create individual directories, it can also be used to create directory trees. In other words, it can create any necessary intermediate folders in order to ensure a full path exists.

  • similar to running mkdir -p in Bash.

import os

os.makedirs('dir/sub_dir/sub_sub_dir')

This will create a nested directories with default permissions:

.
|
└── dir/
    └── sub_dir/
        └── sub_sub_dir/

Use pathlib.Path

import pathlib

p = pathlib.Path('dir/sub_dir/sub_sub_dir')
p.mkdir(parents=True, exist_ok=True)

I prefer using pathlib when creating directories because I can use the same function to create single or nested directories.

Filename Pattern Matching

FunctionDescription
startswith()Tests if a string starts with a specified pattern and returns Trueor False
endswith()Tests if a string ends with a specified pattern and returns True or False
fnmatch.fnmatch(filename, pattern)Tests whether the filename matches the pattern and returns True or False
glob.glob()Returns a list of filenames that match a pattern
pathlib.Path.glob()Finds patterns in path names and returns a generator object

Traversing Directories and Processing Files

os.walk()

os.walk() defaults to traversing directories in a top-down manner.

  • To traverse the directory tree in a bottom-up manner, pass in a topdown=False keyword argument

os.walk() returns three values on each iteration of the loop:

  1. The name of the current folder
  2. A list of folders in the current folder
  3. A list of files in the current folder

Deleting files

FunctionNote
os.remove()Will throw an OSError if the path passed to them points to a directory instead of a file
os.unlink()
  • semantically identical to os.remove()
  • pathlib.Path.unlink()

    Deleting Directories

    Delete single directory

    FunctionNote
    os.rmdir()Only work if the directory you’re trying to delete is empty. If the directory isn’t empty, an OSError is raised.
    pathlib.rmdir()
  • semantically identical to os.rmdir()
  • Delete entire directory trees

    shutil.rmtree(dir): Everything in dir is deleted when shutil.rmtree() is called on it.

    Summary for deleting

    FunctionDescription
    os.remove()Deletes a file and does not delete directories
    os.unlink()Is identical to os.remove() and deletes a single file
    pathlib.Path.unlink()Deletes a file and cannot delete directories
    os.rmdir()Deletes an empty directory
    pathlib.Path.rmdir()Deletes an empty directory
    shutil.rmtree()Deletes entire directory tree and can be used to delete non-empty directories

    Copying

    Copying files

    FunctionNote
    shutil.copy()
  • comparable to the cp command in UNIX based systems.
  • shutil.copy(src, dst) will copy the file src to the location specified in dst. If dstis a file, the contents of that file are replaced with the contents of src. If dst is a directory, then src will be copied into that director
  • shutil.copy() ONLY copies the file’s contents and the file’s permissions. Other metadata like the file’s creation and modification times are not preserved.
  • shutil.copy2()preserve all file metadata when copying

    Copying directories

    shutil.copytree()

    Moving Files and Directories

    To move a file or directory to another location, use shutil.move(src, dst).

    • src: file or directory to be moved
    • dst: destination. If dst does not exist, src will be renamed to dst

    Renaming Files and Directories

    • os.rename(src, dst)
    • pathlib.Path.rename()

    Colab Notebook

    Colab Notebook

    Reference