Modules and Packages

Modules and Packages

Modular programming refers to the process of breaking a large, unwieldy programming task into separate, smaller, more manageable subtasks or modules. Individual modules can then be cobbled together like building blocks to create a larger application.

Advantages to modularizing code in a large application:

  • Simplicity

    Rather than focusing on the entire problem at hand, a module typically focuses on one relatively small portion of the problem. If you’re working on a single module, you’ll have a smaller problem domain to wrap your head around. This makes development easier and less error-prone.

  • Maintainability

    Modules are typically designed so that they enforce logical boundaries between different problem domains. If modules are written in a way that minimizes interdependency, there is decreased likelihood that modifications to a single module will have an impact on other parts of the program. (You may even be able to make changes to a module without having any knowledge of the application outside that module.) This makes it more viable for a team of many programmers to work collaboratively on a large application.

  • Reusability

    Functionality defined in a single module can be easily reused (through an appropriately defined interface) by other parts of the application. This eliminates the need to duplicate code.

  • Scoping

    Modules typically define a separate namespace, which helps avoid collisions between identifiers in different areas of a program. (One of the tenets in the Zen of Python is Namespaces are one honking great idea—let’s do more of those!)

Functions, modules and packages are all constructs in Python that promote code modularization.

Python modules

  • Different ways to define a module in Python:

    • A module can be written in Python itself.
    • A module can be written in C and loaded dynamically at run-time, like the re (regular expression) module.
    • A built-in module is intrinsically contained in the interpreter, like the itertools module.
  • A module’s contents are accessed the same way in all three cases: with the importstatement.

  • Build a module written in python:

    • Create a file that contains legitimate Python code
    • Give the file a name with a .py extension.
  • Example:

    mod.py

    s = "If Comrade Napoleon says it, it must be right."
    a = [100, 200, 300]
    
    def foo(arg):
        print(f'arg={arg}')
    
    class Foo:
        pass
    

    Several objects are defined in mod.py:

    • s (a string)
    • a (a list)
    • foo() (a function)
    • Foo (a class)

    These objects can be accessed by importing the module as follows (assuming mod.py is in an appropriate location):

    >>> import mod
    >>> print(mod.s)
    If Comrade Napoleon says it, it must be right.
    >>> mod.a
    [100, 200, 300]
    >>> mod.foo(['quux', 'corge', 'grault'])
    arg = ['quux', 'corge', 'grault']
    >>> x = mod.Foo()
    >>> x
    <mod.Foo object at 0x03C181F0>
    

The Module Search Path

When the interpreter executes the

import mod

statement, it searches for mod.py in a list of directories assembled from the following sources:

  • The directory from which the input script was run or the current directory if the interpreter is being run interactively
  • The list of directories contained in the PYTHONPATH environment variable, if it is set. (The format for PYTHONPATH is OS-dependent but should mimic the PATHenvironment variable.)
  • An installation-dependent list of directories configured at the time Python is installed

The resulting search path is accessible in the Python variable sys.path, which is obtained from a module named sys.

Thus, to ensure your module is found, you need to do one of the following:

  • Put mod.py in the directory where the input script is located or the current directory, if interactive
  • Modify the PYTHONPATH environment variable to contain the directory where mod.pyis located before starting the interpreter
    • Or: Put mod.py in one of the directories already contained in the PYTHONPATH variable
  • Put mod.py in one of the installation-dependent directories, which you may or may not have write-access to, depending on the OS

One additional option: you can put the module file in any directory of your choice and then modify sys.path at run-time so that it contains that directory.

The import statement

import <module_name>

Note that this does not make the module contents directly accessible to the caller.

  • Each module has its own private symbol table, which serves as the global symbol table for all objects defined in the module. Thus, a module creates a separate namespace
  • The statement import <module_name> only places <module_name> in the caller’s symbol table. The objects that are defined in the module remain in the module’s private symbol table.
  • From the caller, objects in the module are only accessible when prefixed with <module_name> via dot notation

In our example, after the following import statement, mod is placed into the local symbol table. But s and foo remain in the module’s private symbol table and are not meaningful in the local context:

>>> s
NameError: name 's' is not defined
>>> foo('quux')
NameError: name 'foo' is not defined

To be accessed in the local context, names of objects defined in the module must be prefixed by mod:

>>> mod.s
'If Comrade Napoleon says it, it must be right.'
>>> mod.foo('quux')
arg = quux

from <module_name> import <name(s)>

Because this form of import places the object names directly into the caller’s symbol table, any objects that already exist with the same name will be overwritten:

>>> a = ['foo', 'bar', 'baz']
>>> a
['foo', 'bar', 'baz']

>>> from mod import a
>>> a
[100, 200, 300]

Indiscriminately import everything from a module at one fell swoop:

from <module_name> import *

This will place the names of all objects from <module_name> into the local symbol table, with the exception of any that begin with the underscore (_) character.

This isn’t necessarily recommended in large-scale production code. It’s a bit dangerous because you are entering names into the local symbol table en masse. Unless you know them all well and can be confident there won’t be a conflict, you have a decent chance of overwriting an existing name inadvertently.

from <module_name> import <name> as <alt_name>[, <name> as <alt_name> …]

This makes it possible to place names directly into the local symbol table but avoid conflicts with previously existing names

>>> s = 'foo'
>>> a = ['foo', 'bar', 'baz']

>>> from mod import s as string, a as alist
>>> s
'foo'
>>> string
'If Comrade Napoleon says it, it must be right.'
>>> a
['foo', 'bar', 'baz']
>>> alist
[100, 200, 300]

import <module_name> as <alt_name>

Import an entire module under an alternate name

>>> import mod as my_module
>>> my_module.a
[100, 200, 300]
>>> my_module.foo('qux')
arg = qux

The dir() function

The built-in function dir() returns a list of defined names in a namespace. Without arguments, it produces an alphabetically sorted list of names in the current local symbol table

Executing a Module as a Script

Distinguish between when the file is loaded as a module and when it is run as a standalone script:

  • When a .py file is imported as a module, Python sets the special dunder variable __name__ to the name of the module.
  • If a file is run as a standalone script, __name__ is (creatively) set to the string '__main__'.

Using this fact, we can discern which is the case at run-time and alter behavior accordingly:

s = "If Comrade Napoleon says it, it must be right."
a = [100, 200, 300]

def foo(arg):
    print(f'arg = {arg}')

class Foo:
    pass

if (__name__ == '__main__'):
    print('Executing as standalone script')
    print(s)
    print(a)
    foo('quux')
    x = Foo()
    print(x)

Now, if we run as a script, we get output:

>>> python mod.py
Executing as standalone script
If Comrade Napoleon says it, it must be right.
[100, 200, 300]
arg = quux
<__main__.Foo object at 0x03450690>

But if you import as a module, you don’t:

>>> import mod
>>> mod.foo('grault')
arg = grault

Modules are often designed with the capability to run as a standalone script for purposes of testing the functionality that is contained within the module. This is referred to as unit testing.

Reloading a module

For reasons of efficiency, a module is only loaded once per interpreter session. A module can contain executable statements as well, usually for initialization. Be aware that these statements will only be executed the first time a module is imported. If you make a change to a module and need to reload it, you need to either restart the interpreter or use a function called reload() from module importlib.

Python packages

Packages allow for a hierarchical structuring of the module namespace using dot notation. In the same way that modules help avoid collisions between global variable names, packages help avoid collisions between module names.

Creating a package is quite straightforward, since it makes use of the operating system’s inherent hierarchical file structure. Consider the following arrangement:

Image

Here, there is a directory named pkg that contains two modules, mod1.py and mod2.py. The contents of the modules are:

  • mod1.py

    def foo():
        print('[mod1] foo()')
    
    class Foo:
        pass
    
  • mod2.py

    def bar():
        print('[mod2] bar()')
    
    class Bar:
        pass
    

Given this structure, if the pkg directory resides in a location where it can be found (in one of the directories contained in sys.path), we can refer to the two modules with dot notation (pkg.mod1, pkg.mod2) and import them with the syntax we are already familiar with

  • import <module_name>[, <module_name> ...]

    >>> import pkg.mod1, pkg.mod2
    >>> pkg.mod1.foo()
    [mod1] foo()
    >>> x = pkg.mod2.Bar()
    >>> x
    <pkg.mod2.Bar object at 0x033F7290>
    
  • from <module_name> import <name(s)>

    >>> from pkg.mod1 import foo
    >>> foo()
    [mod1] foo()
    
  • from <module_name> import <name> as <alt_name>

    >>> from pkg.mod2 import Bar as Qux
    >>> x = Qux()
    >>> x
    <pkg.mod2.Bar object at 0x036DFFD0>
    

We can import modules with these statements as well:

from <package_name> import <modules_name>[, <module_name> ...]
from <package_name> import <module_name> as <alt_name>
Importing the package doesn’t do much of anything useful. In particular, it does not place any of the modules in pkg into the local namespace

Package Initialization

If a file named __init__.py is present in a package directory, it is invoked when the package or a module in the package is imported. This can be used for execution of package initialization code, such as initialization of package-level data.

Image

Consider the following __init__.py file:

__init__.py:

print(f'Invoking __init__.py for {__name__}')
A = ['quux', 'corge', 'grault']

Now when the package is imported, the global list A is initialized.

__init__.py can also be used to effect automatic importing of modules from a package. If __init__.py in the pkg directory contains the following:

print(f'Invoking __init__.py for {__name__}')
import pkg.mod1, pkg.mod2

then when we execute import pkg, modules mod1 and mod2 are imported automatically:

>>> import pkg
Invoking __init__.py for pkg
>>> pkg.mod1.foo()
[mod1] foo()
>>> pkg.mod2.bar()
[mod2] bar()

Note:

  • Before Python 3.3, an __init__.py file must be present in the package directory when creating a package. It used to be that the very presence of __init__.py signified to Python that a package was being defined. The file could contain initialization code or even be empty, but it had to be present.
  • Starting with Python 3.3, Implicit Namespace Packages were introduced. These allow for the creation of a package without any __init__.py file. Of course, it can still be present if package initialization is needed. But it is no longer required.

Importing * from a package

from <package_name> import *

Python follows this convention:

If the __init__.py file in the package directory contains a list named __all__, it is taken to be a list of modules that should be imported when the statement from <package_name> import * is encountered.

For example, consider the following structure:

Illustration of hierarchical file structure of Python packages

Suppose we create an __init__.py in the pkg directory like this:

pkg/__init__.py

__all__ = [
        'mod1',
        'mod2',
        'mod3',
        'mod4'
        ]

Now from pkg import * imports all four modules:

>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__']

>>> from pkg import *
>>> dir()
['__annotations__', '__builtins__', '__doc__', '__loader__', '__name__',
'__package__', '__spec__', 'mod1', 'mod2', 'mod3', 'mod4']
>>> mod2.bar()
[mod2] bar()
>>> mod4.Qux
<class 'pkg.mod4.Qux'>

In summary, __all__ is used by both packages and modules to control what is imported when import * is specified. But the default behavior differs:

  • For a package, when __all__ is not defined, import * does not import anything.
  • For a module, when __all__ is not defined, import * imports everything (except—you guessed it—names starting with an underscore).

Subpckages

Packages can contain nested subpackages to arbitrary depth. For example:

Image

Importing still works the same as shown previously. Syntax is similar, but additional dot notation is used to separate package name from subpackage name:

import <package_name>.<subpackage_name>.<module_name>

Reference

Python Modules and Packages – An Introduction