Tutorials
1. Getting started with Micc2
Note
These tutorials focus not just on how to use micc2. Rather they describe a workflow for how you cac set up a python project and develop it using best practises, with the help of Micc2.
Micc2 aims at providing a practical interface to the many aspects of managing a Python project: setting up a new project in a standardized way, adding documentation, version control, publishing the code to PyPI, building binary extension modules in C++ or Fortran, dependency management, … For all these aspects there are tools available, yet, with each new project, I found myself struggling to get everything right and looking up the details. Micc2 is an attempt to wrap all the details by providing the user with a standardized yet flexible workflow for managing a Python project. Standardizing is a great way to increase productivity. For many aspects, the tools used by Micc2 are completely hidden from the user, e.g. project setup, adding components, building binary extensions, … For other aspects Micc2 provides just the necessary setup for you to use other tools as you need them. Learning to use the following tools is certainly beneficial:
Git: for version control. Its use is optional but highly recommended. See 5. Version control and version management for some basic git coverage.
Pytest: for (unit) testing. Also optional and also highly recommended.
The basic commands for these tools are covered in these tutorials.
1.1. Creating a project with micc2
Creating a new project with micc2 is simple:
> micc2 create path/to/my-first-project
This creates a new project my-first-project in folder path/to
. Note that the
directory path/to/my-first-project
must either not exist, or be empty.
Typically, you will create a new project in the current working directory, say: your
workspace, so first cd
into your workspace directory:
> cd path/to/workspace
> micc2 create my-first-project --remote=none
[INFO] [ Creating project directory (my-first-project):
[INFO] Python top-level package (my_first_project):
[INFO] [ Creating local git repository
[INFO] ] done.
[WARNING] Creation of remote GitHub repository not requested.
[INFO] ] done.
As the output tells, micc2 has created a new project in directory
my-first-project
containing a python package
my_first_project
. This is a directory with an __init__.py
file,
containing the Pythonvariables, classes and meethods it needs to expose. This
directory and its contents represent the Python module.
> my-first-project # the project directory└── my_first_project # the package directory └── __init__.py # the file where your Python code goes
Note
Next to the package structure - a directory with an __init__.py
filePython also allows for module structure - a mere :file:`my_first_project.py
file - containing the Python variables, classes and meethods it needs to expose.
The*module* structure is essentially a single file and Python-only approach, which
often turns out to be too restrictive. As of v3.0 micc2 only supports the creation of
modules with a packages structure, which allows for adding submodules, command
line interfaces (CLIs), and binary extension modules builtfrom other languages as
C++ and Fortran. Micc2 greatly facilitates adding suchcomponents.
Note that the module name differs slightly from the project name. Dashes are been replaced with underscores and uppercase with lowercase in order to yield a PEP 8 compliant module name. If you want your module name to be unrelated to your project name, check out the 1.1.1. What’s in a name section.
Micc2 automatically creates a local git repository for our project (provided the
git
command is available) and it commits all the project files that it generated
with commit message ‘And so this begun…’. The --remote=none
flag prevents
Micc2 from also creating a remote repository on GitHub. Without that flag, Micc2
would have created a public remote repository on GitHub and pushed that first commit
(tht requires that we have set up Micc2 with a GitHub username and a personal access
token for it as described in First time Micc2 setup. You can also request the remote
repository to be private by specifying --remote=private
.
After creating the project, we cd
into the project directory. All Micc2
commands detect automatically that they are run from a project directory and
consequently act on the project in the current working directory. E.g.:
> > cd my-first-project
> micc2 info
Project my-first-project located at /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/my-first-project
package: my_first_project
version: 0.0.0
contents:
my_first_project top-level package (source in my_first_project/__init__.py)
As the info
subcommand, shows info on a project, is running inside the
my-first-project
directory, we get the info on the
my-first-project
project.
To apply a Micc2 command to a project that is not in the current working directory see 1.2.1. The project path in Micc2.
Note
Micc2 has a built-in help function: micc2 --help
shows the global options, which
appear in front of the subcommand, and lists the subcommands, and micc2 subcommand
--help
, prints detailed help for a subcommand.
1.1.1. What’s in a name
The name you choose for your project is not without consequences. Ideally, a project name is:
descriptive,
unique,
short.
Although one might think of even more requirements, such as being easy to type,
satisfying these three is already hard enough. E.g. the name
my_nifty_module
may possibly be unique, but it is neither descriptive,
neither short. On the other hand, dot_product
is descriptive, reasonably
short, but probably not unique. Even my_dot_product
is probably not
unique, and, in addition, confusing to any user that might want to adopt your
my_dot_product
. A unique name - or at least a name that has not been taken
before - becomes really important when you want to publish your code for others to use
it (see 7. Publishing your code for details). The standard place to publish Python code is
the Python Package Index, where you find hundreds of
thousands of projects, many of which are really interesting and of high quality. Even
if there are only a few colleagues that you want to share your code with, you make their
life (as well as yours) easier when you publish your my_nifty_module
at
PyPI. To install your my_nifty_module
they will only need to type:
> python -m pip install my_nifty_module
The name my_nifty_module is not used so far, but nevertheless we recommend to choose a better name.
If you intend to publish your code on PyPI, we recommend that you create your project
with the --publish
flag. Micc2 then checks if the name you want to use for your
project is still available on PyPI. If not, it refuses to create the project and asks
you to use another name for your project:
> micc2 create oops --publish
[ERROR]
The name 'oops' is already in use on PyPI.
The project is not created.
You must choose another name if you want to publish your code on PyPI.
As there are indeed hundreds of thousands of Python packages published on PyPI,
finding a good name has become quite hard. Personally, I often use a simple and short
descriptive name, prefixed by my initials, et-
, which usually makes the
name unique. E.g et-oops
does not exist. This has the additional advantage
that all my published modules are grouped in the alphabetic PyPI listing.
Another point of attention is that although in principle project names can be anything supported by your OS file system, as they are just the name of a directory, Micc2 insists that module and package names comply with the PEP8 module naming rules. Micc2 derives the package (or module) name from the project name as follows:
capitals are replaced by lower-case
hyphens
'-'
are replaced by underscores'_'
If the resulting module name is not PEP8 compliant, you get an informative error message:
> micc create 1proj
/bin/sh: micc: command not found
The last line indicates that you can specify an explicit module name, unrelated to the project name. In that case PEP8 compliance is not checked. The responsability is then all yours.
1.2. First steps in project management using Micc2
1.2.1. The project path in Micc2
All micc2 commands accept the global --project-path=<path>
parameter.
Global parameters appear before the subcommand name. E.g. the command:
> micc2 --project-path path/to/my_project info
will print the info on the project located at path/to/my_project
. This can
conveniently be abbreviated as:
> micc2 -p path/to/my_project info
Even the create
command accepts the global --project-path=<path>
parameter:
> micc2 -p path/to/my_project create
will attempt to create project my_project
at the specified location. The
command is equivalent to:
> micc2 create path/to/my_project
The default value for the project path is the current working directory. Micc2 commands without an explicitly specified project path will act on the project in the current working directory.
1.2.2. Virtual environments
Virtual environments enable you to set up a Python environment that is isolated from the installed Python on your system and from other virtual environments. In this way you can easily cope with varying dependencies between your Python projects.
For a detailed introduction to virtual environments see Python Virtual Environments: A Primer.
When you are developing or using several Python projects simultaneously, it can become difficult for a single Python environment to satisfy all the dependency requirements of these projects. Dependency conflicts can easily arise. Python promotes and facilitates code reuse and as a consequence Python tools typically depend on tens to hundreds of other modules. If tool-A and tool-B both need module-C, but each requires a different version of it, there is a conflict because it is impossible to install two different versions of the same module in a Python environment. The solution that the Python community has come up with for this problem is the construction of virtual environments, which isolates the dependencies of a single project in a single environment.
For this reason it is recommended to create a virtual environment for every project you start. Here is how that goes:
1.2.2.1. Creating virtual environments
> python -m venv .venv-my-first-project
This creates a directory .venv-my-first-project
representing the
virtual environment. The Python version of this virtual environment is the Python
version that was used to create it. Use the tree
command to get an overview of its
directory structure:
> tree .venv-my-first-project -L 4
.venv-my-first-project
├── bin
│ ├── Activate.ps1
│ ├── activate
│ ├── activate.csh
│ ├── activate.fish
│ ├── easy_install
│ ├── easy_install-3.8
│ ├── pip
│ ├── pip3
│ ├── pip3.8
│ ├── python -> /Users/etijskens/.pyenv/versions/3.8.5/bin/python
│ └── python3 -> python
├── include
├── lib
│ └── python3.8
│ └── site-packages
│ ├── __pycache__
│ ├── easy_install.py
│ ├── pip
│ ├── pip-20.1.1.dist-info
│ ├── pkg_resources
│ ├── setuptools
│ └── setuptools-47.1.0.dist-info
└── pyvenv.cfg
11 directories, 13 files
As you can see there is a bin
, include
, and a lib
directory.
In the bin
directory you find installed commands, like activate
,
pip
, and the python
of the virtual environment. The lib
directory contains the installed site-packages, and the include
directory containes include files of installed site-packages for use with C, C++ or
Fortran.
If the Python version you used to create the virtual environment has pre-installed
packages you can make them available in your virtual environment by adding the
--system-site-packages
flag:
> python -m venv .venv-my-first-project --system-site-packages
This is especially useful in HPC environments, where the pre-installed packages typically have a better computational efficiency.
As to where you create these virtual environments there are two common approaches.
One is to create a venvs
directory where you put all your virtual
environments. This is practical if you have virtual environments which are common to
several projects. The other one is to have one virtual environment for each project
and locate it in the project directory. Note that if you have several Python versions
on your system you may also create several virtual environments with different
Python versions for a project.
In order to use a virtual environment, you must activate it:
> . .venv-my-first-project/bin/activate
(.venv-my-first-project) >
Note how the prompt has changed as to indicate that the virtual environment is active,
and that current Python is now that of the virtual environment, and the only Python
packages available are the ones installed in it, as well as the system site packages of
the corresponding Python if the virtual environmnet was created with the
--system-site-packages
flag. To deactivate the virtual environment, run:
(.venv-my-first-project) > deactivate
>
The prompt has turned back to normal.
So far, the virtual environment is pretty much empty (except for the system site
packages if if was created with the --system-site-packages
flag). We must
install the packages that our project needs. Pip does the trick:
> python -m pip install some-needed-package
We must also install the project itself, if it is to be used in the virtual environment.
If the project is not under development, we can just run pip install
. Otherwise,
we want the code changes that we make while developing to be instantaneously visible
in the virtual environment. Pip can do editable installs, but only for packages
which provide a setup.py
file. Micc2 does not provide setup.py
files for its projects, but it has a simple workaround for editable installs. First
cd
into your project directory and activate its virtual environment, then run
the install-e.py
script:
> cd path/to/my-first-project
> source .venv-my-first-project/bin/activate
(.venv-my-first-project)> python ~/.micc2/scripts/install-e.py
...
Editable install of my-first-project is ready.
If something is wrong with a virtual environment, you can simply delete it:
> rm -rf .venv-my-first-project
and recreate it.
1.2.3. Modules and scripts
A Python script is a piece of Python code that performs a certain task. A Python module, on the other hand, is a piece of Python code that provides a client code, such as a script, with useful Python classes, functions, objects, and so on, to facilitate the script’s task. To that end client code must import the module.
Python has a mechanism that allows a Python file to behave as both as a script and as
module. Consider this Python file my_first_project.py
. as it was created
by Micc2 in the first place. Note that Micc2 always creates project files
containing fully functional examples to demonstrate how things are supposed to be
done.
# -*- coding: utf-8 -*-
"""
Package my_first_project
========================
A hello world example.
"""
__version__ = "0.0.0"
def hello(who="world"):
"""A "Hello world" method.
:param str who: whom to say hello to
:returns: a string
"""
result = f"Hello {who}!"
return result
The module file starts with a file doc-string that describes what the file about and a
__version__
definition and then goes on defining a simple hello
method. A client script script.py
can import the
my_first_project.py
module to use its hello
method:
# file script.py
import my_first_project
print(my_first_project.hello("dear students"))
When executed, this results in printing Hello dear students!
> python script.py
Hello dear students!
Python has an interesting idiom for allowing a module also to behave as a script.
Python defines a __name__
variable for each file it interprets. When the file is
executed as a script, as in python script.py
, the
__name__
variable is set to __main__
and when the file is imported the __name__``
variable is set to the module name. By testing the value of the __name__`` variable we
can selectively execute statements depending on whether a Python file is imported or
executed as a script. E.g. below we we added some tests for the hello
method:
#...
def hello(who="world"):
"""A "Hello world" method.
:param str who: whom to say hello to
:returns: a string
"""
result = f"Hello {who}!"
return result
if __name__ == "__main__":
assert hello() == "Hello world!
assert hello("students") == "Hello students!
If we now execute my_first_project.py
the if __name__ == "__main__":
clause evaluates to True
and the two assertions are executed - successfully.
So, adding a if __name__ == "__main__":
clause at the end of a module allows it to
behave as a script. This is Python idiom comes in handy for quick testing or debugging a
module. Running the file as a script will execute the test and raise an AssertionError
if it fails. If so, we can run it in debug mode to see what goes wrong.
While this is a very productive way of testing, it is a bit on the quick and dirty side. As the module code and the tests become more involved, the module file will soon become cluttered with test code and a more scalable way to organise your tests is needed. Micc2 has already taken care of this.
1.2.4. Testing your code
Test driven development is a software development process that relies on the repetition of a very short development cycle: requirements are turned into very specific test cases, then the code is improved so that the tests pass. This is opposed to software development that allows code to be added that is not proven to meet requirements. The advantage of this is clear: the shorter the cycle, the smaller the code that is to be searched for bugs. This allows you to produce correct code faster, and in case you are a beginner, also speeds your learning of Python. Please check Ned Batchelder’s very good introduction to testing with pytest.
When Micc2 created project my-first-project
, it not only added a
hello
method to the module file, it also created a test script for it in the
tests
directory of the project directory. The testS for the
my_first_project
module is in file
tests/test_my_first_project.py
. Let’s take a look at the relevant
section:
# -*- coding: utf-8 -*-
"""Tests for my_first_project package."""
import my_first_project
def test_hello_noargs():
"""Test for my_first_project.hello()."""
s = my_first_project.hello()
assert s=="Hello world!"
def test_hello_me():
"""Test for my_first_project.hello('me')."""
s = my_first_project.hello('me')
assert s=="Hello me!"
The tests/test_my_first_project.py
file contains two tests. One for
testing the hello
method with a default argument, and one for testing it with
argument 'me'
. Tests like this are very useful to ensure that during development
the changes to your code do not break things. There are many Python tools for unit
testing and test driven development. Here, we use Pytest. The tests are
automatically found and executed by running pytest
in the project directory:
> pytest tests -v
============================= test session starts ==============================
platform darwin -- Python 3.8.5, pytest-6.2.2, py-1.11.0, pluggy-0.13.1 -- /Users/etijskens/.pyenv/versions/3.8.5/bin/python
cachedir: .pytest_cache
rootdir: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/my-first-project
collecting ... collected 2 items
tests/my_first_project/test_my_first_project.py::test_hello_noargs PASSED [ 50%]
tests/my_first_project/test_my_first_project.py::test_hello_me PASSED [100%]
============================== 2 passed in 0.03s ===============================
Specifying the tests
directory ensures that Pytest looks for tests only in
the tests
directory. This is usually not necessary, but it avoids that
pytest
’s test discovery algorithm discovers test which are not meant to be. The
-v
flag increases pytest
’s verbosity. The output shows that pytest
discovered the two tests put in place by Micc2 and that they both passed.
Note
Pytest looks for test methods in all test_*.py
or *_test.py
files
in the current directory and accepts (1) test
prefixed methods outside classes
and (2) test
prefixed methods inside Test
prefixed classes as testmethods
to be executed.
If a test would fail you get a detailed report to help you find the cause of theerror and fix it.
Note
A failing test not necessarily implies that your module is faulty. Test code is also code and therefore can contain errors, too. It is not uncommon that a failing test is caused by a buggy test rather than a buggy method or class.
1.2.4.1. Debugging test code
When the report provided by Pytest does not yield an obvious clue on the cause of the
failing test, you must use debugging and execute the failing test step by step to find
out what is going wrong where. From the viewpoint of Pytest, the files in the
tests
directory are modules. Pytest imports them and collects the test
methods, and executes them. Micc2 also makes every test module executable using the
Python if __name__ == "__main__":
idiom described above. At the end of every test
file you will find some extra code:
if __name__ == "__main__": # 0
the_test_you_want_to_debug = test_hello_noargs # 1
# 2
print("__main__ running", the_test_you_want_to_debug) # 3
the_test_you_want_to_debug() # 4
print('-*# finished #*-') # 5
On line # 1
, the name of the test method we want to debug is aliased as
the_test_you_want_to_debug
, c.q. test_hello_noargs
. The variable thus becomes an alias for the test method. Line # 3
prints a message with the name of the test method being debugged to assure you that you
are running the test you want. Line # 4
calls the test method, and, finally, line
# 5
prints a message just before quitting, to assure you that the code went well
until the end.
(.venv-my-first-project) > python tests/test_my_first_project.py
__main__ running <function test_hello_noargs at 0x1037337a0> # output of line # 3
-*# finished #*- # output of line # 5
Obviously, you can run this script in a debugger to see what goes wrong where.
1.2.5. Generating documentation
Note
It is not recommended to build documentation in HPC environments.
Documentation is generated almost completely automatically from the source code
using Sphinx. It is extracted from the doc-strings in your code. Doc-strings are the
text between triple double quote pairs in the examples above, e.g. """This is a
doc-string."""
. Important doc-strings are:
module doc-strings: at the beginning of the module. Provides an overview of what the module is for.
class doc-strings: right after the
class
statement: explains what the class is for. Usually, the doc-string of the __init__ method is put here as well, as dunder methods (starting and ending with a double underscore) are not automatically considered by Sphinx.method doc-strings: right after a
def
statement, class methods should alsoget a doc-string.
According to pep-0287 the recommended format for Python doc-strings is restructuredText.E.g. a typical method doc-string looks like this:
def hello_world(who='world'):
"""Short (one line) description of the hello_world method.
A detailed description of the hello_world method.
blablabla...
:param str who: an explanation of the who parameter. You should
mention e.g. its default value.
:returns: a description of what hello_world returns (if relevant).
:raises: which exceptions are raised under what conditions.
"""
# here goes your code ...
Here, you can find some more examples.
Thus, if you take good care writing doc-strings, helpful documentation follows automatically.
Micc2 sets up al the necessary components for documentation generation in the
docs
directory. To generate documentation in html format, run:
(.venv-my-first-project) > micc2 doc
This will generate documentation in html format in directory
et-dot/docs/_build/html
. The default html theme for this is
sphinx_rtd_theme. To view the documentation open the file
et-dot/docs/_build/html/index.html
in your favorite browser . Other
formats than html are available, but your might have to install addition packages. To
list all available documentation formats run:
> micc2 doc help
The boilerplate code for documentation generation is in the docs
directory, just as if it were generated manually using the sphinx-quickstart
command. Modifying those files is not recommended, and only rarely needed. Then
there are a number of .rst
files in the project directory with capitalized
names:
README.rst
is assumed to contain an overview of the project. This file has some boiler plate text, but must essentially be maintained by the authors of the project.AUTHORS.rst
lists the contributors to the project.CHANGELOG.rst
is supposed to describe the changes that were made to the code from version to version. This file must entirely be maintained byby the authors of the project.API.rst
describes the classes and methods of the project in detail. This file is automatically updated when new components are added through Micc2_commands.APPS.rst
describes command line interfaces or apps added to your project. Just asAPI.rst
it is automatically updated when new CLIs are added through Micc2 commands. For CLIs the documentation is extracted from thehelp
parameters of the command options with the help of Sphinx_click.
Note
The .rst
extenstion stands for reStructuredText. It is a simple and
concise approach to text formatting. See
RestructuredText Primer
for an overview.
1.2.6. Version control
Version control is extremely important for any software project with a lifetime of more a day. Micc2 facilitates version control by automatically creating a local git repository in your project directory. If you do not want to use it, you may ignore it or even delete it. If you have setup Micc2 correctly, it can even create remote Github repositories for your project, public as well as private.
Git is a version control system (VCS) that solves many practical problems related to the process software development, independent of whether your are the only developer, or whether there is an entire team working on it from different places in the world. You find more information about how Micc2 cooperates with Git in 5. Version control and version management.
1.3. Miscellaneous
1.3.1. License
When you set up Micc2 you can select the default license for your Micc2 projects. You can choose between:
MIT license
BSD license
ISC license
Apache Software License 2.0
GNU General Public License v3
Not open source
If you’re unsure which license to choose, you can use resources such as GitHub’s Choose a License. You can always overwrite the default chosen when you create a project. The first characters suffice to select the license:
micc2 --software-license=BSD create
The project directory will contain a LICENCE
file, a plain text file
describing the license applicable to your project.
1.3.2. The pyproject.toml file
Micc2 maintains a pyproject.toml
file in the project directory. This is
the modern way to describe the build system requirements of a project (see
PEP 518 ). Although this
file’s content is generated automatically some understanding of it is useful
(checkout https://poetry.eustace.io/docs/pyproject/).
In Micc2’s predecessor, Micc, Poetry was used extensively for creating virtual
environments and managing a project’s dependencies. However, at the time of
writing, Poetry still fails to create virtual environments which honorthe
--system-site-packages
. This causes serious problems on HPC clusters, and
consequently, we do not recommend the use of poetry when your projects have to run on
HPC clusters. As long as this issue remains, we recommend to add a project’s
dependencies manually in the pyproject.toml
file, so that when someone
would install your project with Pip, its dependendies are installed with it.
Poetry remains indeed very useful for publishing your project to PyPI from your
desktop or laptop.
The pyproject.toml
file is rather human-readable. Most entries are
trivial. There is a section for dependencies [tool.poetry.dependencies]
,
development dependencies [tool.poetry.dev-dependencies]
. You can maintain
these manually. There is also a section for CLIs [tool.poetry.scripts]
which is
updated automatically whenever you add a CLI through Micc2.
> cat pyproject.toml
[tool.poetry]
name = "my-first-project"
version = "0.0.0"
description = "My first micc2 project"
authors = ["John Doe <john.doe@example.com>"]
license = "MIT"
readme = 'Readme.rst'
repository = "https://github.com/jdoe/my-first-project"
homepage = "https://github.com/jdoe/my-first-project"
[tool.poetry.dependencies]
python = "^3.7"
[tool.poetry.dev-dependencies]
[tool.poetry.scripts]
[build-system]
requires = ["poetry>=0.12"]
build-backend = "poetry.masonry.api"
2. A first real project
Let’s start with a simple problem: a Python module that computes the scalar product of two arrays, generally referred to as the dot product. Admittedly, this not a very rewarding goal, as there are already many Python packages, e.g. Numpy, that solve this problem in an elegant and efficient way. However, because the dot product is such a simple concept in linear algebra, it allows us to illustrate the usefulness of Python as a language for HPC, as well as the capabilities of Micc2.
First, we set up a new project for this dot project, with the name ET-dot
,
ET
being my initials (check out 1.1.1. What’s in a name).
> micc2 create ET-dot --remote=none
[INFO] [ Creating project directory (ET-dot):
[INFO] Python top-level package (et_dot):
[INFO] [ Creating local git repository
[INFO] ] done.
[WARNING] Creation of remote GitHub repository not requested.
[INFO] ] done.
We cd
into the project directory, so Micc2 knows is as the current project.
> cd ET-dot
Now, open module file et_dot.py
in your favourite editor and start coding a
dot product method as below. The example code created by Micc2 can be removed.
# -*- coding: utf-8 -*-
"""
Package et_dot
==============
Python module for computing the dot product of two arrays.
"""
__version__ = "0.0.0"
def dot(a,b):
"""Compute the dot product of *a* and *b*.
:param a: a 1D array.
:param b: a 1D array of the same length as *a*.
:returns: the dot product of *a* and *b*.
:raises: ValueError if ``len(a)!=len(b)``.
"""
n = len(a)
if len(b)!=n:
raise ValueError("dot(a,b) requires len(a)==len(b).")
result = 0
for i in range(n):
result += a[i]*b[i]
return result
We defined a dot()
method with an informative doc-string that describes
the parameters, the return value and the kind of exceptions it may raise. If you like,
you can add a if __name__ == '__main__':
clause for quick-and-dirty testing or
debugging (see 1.2.3. Modules and scripts). It is a good idea to commit this
implementation to the local git repository:
> git commit -a -m 'implemented dot()'
[main d452a13] implemented dot()
1 file changed, 23 insertions(+), 22 deletions(-)
rewrite et_dot/__init__.py (71%)
(If there was a remote GitHub repository, you could also push that commit
git push
, as to enable your colleagues to acces the code as well.)
We can use the dot method in a script as follows:
from et_dot import dot
a = [1,2,3]
b = [4.1,4.2,4.3]
a_dot_b = dot(a,b)
Or we might execute these lines at the Python prompt:
>>> from et_dot import dot
>>> a = [1,2,3]
>>> b = [4.1,4.2,4.3]
>>> a_dot_b = dot(a,b)
>>> expected = 1*4.1 + 2*4.2 +3*4.3
>>> print(f"a_dot_b = {a_dot_b} == {expected}")
a_dot_b = 25.4 == 25.4
Note
This dot product implementation is naive for several reasons:
Python is very slow at executing loops, as compared to Fortran or C++.
The objects we are passing in are plain Python
list`s. A :py:obj:`list
is a very powerfull data structure, with array-like properties, but it is not exactly an array. Alist
is in fact an array of pointers to Python objects, and therefor list elements can reference anything, not just a numeric value as we would expect from an array. With elements being pointers, looping over the array elements implies non-contiguous memory access, another source of inefficiency.The dot product is a subject of Linear Algebra. Many excellent libraries have been designed for this purpose. Numpy should be your starting point because it is well integrated with many other Python packages. There is also Eigen, a C++ template library for linear algebra that is neatly exposed to Python by pybind11.
However, starting out with a simple and naive implementation is not a bad idea at all. Once it is proven correct, it can serve as reference implementation to validate later improvements.
2.1. Testing the code
In order to prove that our implementation of the dot product is correct, we write some
tests. Open the file tests/et_dot/test_et_dot.py
, remove the original
tests put in by micc2, and add a new one like below:
import et_dot
def test_dot_aa():
a = [1,2,3]
expected = 14
result = et_dot.dot(a,a)
assert result==expected
The test test_dot_aa()
defines an array with 3 int
numbers, and
computes the dot product with itself. The expected result is easily calculated by
hand. Save the file, and run the test, usi ng Pytest as explained in
1.2.4. Testing your code. Pytest will show a line for every test source file an on
each such line a .
will appear for every successfull test, and a F
for a failing
test. Here is the result:
> pytest tests
============================= test session starts ==============================
platform darwin -- Python 3.8.5, pytest-6.2.2, py-1.11.0, pluggy-0.13.1
rootdir: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot
collected 1 item
tests/et_dot/test_et_dot.py . [100%]
============================== 1 passed in 0.02s ===============================
Great, our test succeeds. If you want some more detail you can add the -v
flag.
Pytest always captures the output without showing it. If you need to see it to help you
understand errors, add the -s
flag.
We thus have added a single test and verified that it works by running ‘’pytest’’. It is good practise to commit this to our local git repository:
> git commit -a -m 'added test_dot_aa()'
[main 406f097] added test_dot_aa()
1 file changed, 9 insertions(+), 36 deletions(-)
rewrite tests/et_dot/test_et_dot.py (98%)
Obviously, our test tests only one particular case, and, perhaps, other cases might
fail. A clever way of testing is to focus on properties. From mathematics we now that
the dot product is commutative. Let’s add a test for that. Open
test_et_dot.py
again and add this code:
import et_dot
import random
def test_dot_commutative():
# create two arrays of length 10 with random float numbers:
a = []
b = []
for _ in range(10):
a.append(random.random())
b.append(random.random())
# test commutativity:
ab = et_dot.dot(a,b)
ba = et_dot.dot(b,a)
assert ab==ba
Note
Focussing on mathematical properties sometimes requires a bit more thought. Our mathematical intuition is based on the properties of real numbers - which, as a matter of fact, have infinite precision. Programming languages, however, use floating point numbers, which have a finite precision. The mathematical properties for floating point numbers are not the same as for real numbers. we’ll come to that later.
> pytest tests -v
============================= test session starts ==============================
platform darwin -- Python 3.8.5, pytest-6.2.2, py-1.11.0, pluggy-0.13.1 -- /Users/etijskens/.pyenv/versions/3.8.5/bin/python
cachedir: .pytest_cache
rootdir: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot
collecting ... collected 2 items
tests/test_et_dot.py::test_dot_commutative PASSED [ 50%]
tests/et_dot/test_et_dot.py::test_dot_aa PASSED [100%]
============================== 2 passed in 0.02s ===============================
The new test passes as well.
Above we used the random()
module from Python’s standard library for
generating the random numbers that populate the array. Every time we run the test,
different random numbers will be generated. That makes the test more powerful and
weaker at the same time. By running the test over and over againg new random arrays will
be tested, growing our cofidence inour dot product implementations. Suppose,
however, that all of a sudden thetest fails. What are we going to do? We know that
something is wrong, but we have no means of investigating the source of the error,
because the next time we run the test the arrays will be different again and the test may
succeed again. The test is irreproducible. Fortunateely, that can be fixed by
setting the seed of the random number generator:
def test_dot_commutative():
# Fix the seed for the random number generator of module random.
random.seed(0)
# choose array size
n = 10
# create two arrays of length 10 with zeroes:
a = n*[0]
b = n*[0]
# repeat the test 1000 times:
for _ in range(1000):
for i in range(10):
a[i] = random.random()
b[i] = random.random()
# test commutativity:
ab = et_dot.dot(a,b)
ba = et_dot.dot(b,a)
assert ab==ba
> pytest tests -v
============================= test session starts ==============================
platform darwin -- Python 3.8.5, pytest-6.2.2, py-1.11.0, pluggy-0.13.1 -- /Users/etijskens/.pyenv/versions/3.8.5/bin/python
cachedir: .pytest_cache
rootdir: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot
collecting ... collected 2 items
tests/test_et_dot.py::test_dot_commutative PASSED [ 50%]
tests/et_dot/test_et_dot.py::test_dot_aa PASSED [100%]
============================== 2 passed in 0.02s ===============================
The 1000 tests all pass. If, say test 315 would fail, it would fail every time we run it and the source of error could be investigated.
Another property is that the dot product of an array of ones with another array is the sum of the elements of the other array. Let us add another test for that:
def test_dot_one():
# Fix the seed for the random number generator of module random.
random.seed(0)
# choose array size
n = 10
# create two arrays of length 10 with zeroes, resp. ones:
a = n*[0]
one = n*[1]
# repeat the test 1000 times:
for _ in range(1000):
for i in range(10):
a[i] = random.random()
# test:
aone = et_dot.dot(a,one)
expected = sum(a)
assert aone==expected
> pytest tests -v
============================= test session starts ==============================
platform darwin -- Python 3.8.5, pytest-6.2.2, py-1.11.0, pluggy-0.13.1 -- /Users/etijskens/.pyenv/versions/3.8.5/bin/python
cachedir: .pytest_cache
rootdir: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot
collecting ... collected 3 items
tests/test_et_dot.py::test_dot_commutative PASSED [ 33%]
tests/test_et_dot.py::test_dot_one PASSED [ 66%]
tests/et_dot/test_et_dot.py::test_dot_aa PASSED [100%]
============================== 3 passed in 0.02s ===============================
Success again. We are getting quite confident in the correctness of our implementation. Here is yet another test:
def test_dot_one_2():
a1 = 1.0e16
a = [a1 , 1.0, -a1]
one = [1.0, 1.0, 1.0]
# test:
aone = et_dot.dot(a,one)
expected = 1.0
assert aone == expected
Clearly, it is a special case of the test above. The expected result is the sum of the
elements in a
, that is 1.0
. Yet it - unexpectedly - fails. Fortunately
pytest produces a readable report about the failure:
> pytest tests -v
============================= test session starts ==============================
platform darwin -- Python 3.8.5, pytest-6.2.2, py-1.11.0, pluggy-0.13.1 -- /Users/etijskens/.pyenv/versions/3.8.5/bin/python
cachedir: .pytest_cache
rootdir: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot
collecting ... collected 4 items
tests/test_et_dot.py::test_dot_commutative PASSED [ 25%]
tests/test_et_dot.py::test_dot_one PASSED [ 50%]
tests/test_et_dot.py::test_dot_one_2 FAILED [ 75%]
tests/et_dot/test_et_dot.py::test_dot_aa PASSED [100%]
=================================== FAILURES ===================================
________________________________ test_dot_one_2 ________________________________
def test_dot_one_2():
a1 = 1.0e16
a = [a1 , 1.0, -a1]
one = [1.0, 1.0, 1.0]
# test:
aone = et_dot.dot(a,one)
expected = 1.0
> assert aone == expected
E assert 0.0 == 1.0
E +0.0
E -1.0
tests/test_et_dot.py:57: AssertionError
=========================== short test summary info ============================
FAILED tests/test_et_dot.py::test_dot_one_2 - assert 0.0 == 1.0
========================= 1 failed, 3 passed in 0.04s ==========================
Mathematically, our expectations about the outcome of the test are certainly
correct. Yet, pytest tells us it found that the result is 0.0
rather than
1.0
. What could possibly be wrong? Well our mathematical expectations are based
on our assumption that the elements of a
are real numbers. They aren’t. The
elements of a
are floating point numbers, which can only represent a finite
number of decimal digits. Double precision numbers, which are the default
floating point type in Python, are typically truncated after 16 decimal digits,
single precision numbers after 8. Observe the consequences of this in the Python
statements below:
>>> print( 1.0 + 1e16 )
1e+16
>>> print( 1e16 + 1.0 )
1e+16
Because 1e16
is a 1 followed by 16 zeroes, adding 1
would alter the 17th
digit,which is, because of the finite precision, not represented. An approximate
result is returned, namely 1e16
, which is of by a relative error of only 1e-16.
>>> print( 1e16 + 1.0 - 1e16 )
0.0
>>> print( 1e16 - 1e16 + 1.0 )
1.0
>>> print( 1.0 + 1e16 - 1e16 )
0.0
Although each of these expressions should yield 0.0
, if they were real numbers,
the result differs because of the finite precision. Python executes the expressions
from left to right, so they are equivalent to:
>>> 1e16 + 1.0 - 1e16 = ( 1e16 + 1.0 ) - 1e16 = 1e16 - 1e16 = 0.0
>>> 1e16 - 1e16 + 1.0 = ( 1e16 - 1e16 ) + 1.0 = 0.0 + 1.0 = 1.0
>>> 1.0 + 1e16 - 1e16 = ( 1.0 + 1e16 ) - 1e16 = 1e16 - 1e16 = 0.0
There are several lessons to be learned from this:
The test does not fail because our code is wrong, but because our mind is used to reasoning about real number arithmetic, rather than floating point arithmetic rules. As the latter is subject to round-off errors, tests sometimes fail unexpectedly. Note that for comparing floating point numbers the the standard library provides a
math.isclose()
method.Another silent assumption by which we can be mislead is in the random numbers. In fact,
random.random()
generates pseudo-random numbers in the interval ``[0,1[``, which is quite a bit smaller than]-inf,+inf[
. No matter how often we run the test the special case above that fails will never be encountered, which may lead to unwarranted confidence in the code.
So let us fix the failing test using math.isclose()
to account for
round-off errors by specifying an relative tolerance and negating the condition for
the original test:
def test_dot_one_2():
a1 = 1.0e16
a = [a1 , 1.0, -a1]
one = [1.0, 1.0, 1.0]
# test:
aone = et_dot.dot(a,one)
expected = 1.0
assert aone != expected
assert math.isclose(result, expected, rel_tol=1e-15)
Another aspect that deserves testing the behavior of the code in exceptional
circumstances. Does it indeed raise ArithmeticError
if the arguments
are not of the same length?
import pytest
def test_dot_unequal_length():
a = [1,2]
b = [1,2,3]
with pytest.raises(ArithmeticError):
et_dot.dot(a,b)
Here, pytest.raises()
is a context manager that will verify that
ArithmeticError
is raise when its body is executed. The test will succeed
if indeed the code raises ArithmeticError
and raise
AssertionErrorError
if not, causing the test to fail. For an explanation
fo context managers see The Curious Case of Python’s Context Manager.Note
that you can easily make et_dot.dot()
raise other exceptions, e.g.
TypeError
by passing in arrays of non-numeric types:
>>> import et_dot
>>> et_dot.dot([1,2],[1,'two'])
Traceback (most recent call last):
File "/Users/etijskens/.local/lib/python3.8/site-packages/et_rstor/__init__.py", line 445, in rstor
exec(line)
File "<string>", line 1, in <module>
File "./et_dot/__init__.py", line 22, in dot
result += a[i]*b[i]
TypeError: unsupported operand type(s) for +=: 'int' and 'str'
Note that it is not the product a[i]*b[i]
for i=1
that is wreaking havoc, but
the addition of its result to d
. Furthermore, Don’t bother the link to where the
error occured in the traceback. It is due to the fact that this course is completely
generated with Python rather than written by hand).
More tests could be devised, but the current tests give us sufficient confidence. The point where you stop testing and move on with the next issue, feature, or project is subject to various considerations, such as confidence, experience, problem understanding, and time pressure. In any case this is a good point to commit changes and additions, increase the version number string, and commit the version bumb as well:
> git add tests #hide#
> git commit -a -m 'dot() tests added'
[main ff3d8ae] dot() tests added
1 file changed, 73 insertions(+)
create mode 100644 tests/test_et_dot.py
> micc2 version -p
[INFO] (ET-dot)> version (0.0.0) -> (0.0.1)
> git commit -a -m 'v0.0.1'
[main 370795b] v0.0.1
2 files changed, 2 insertions(+), 2 deletions(-)
The the micc2 version
flag -p
is shorthand for --patch
, and requests
incrementing the patch (=last) component of the version string, as seen in the
output. The minor component can be incremented with -m
or --minor
, the major
component with -M
or --major
.
At this point you might notice that even for a very simple and well defined function, as
the dot product, the amount of test code easily exceeds the amount of tested code by a
factor of 5 or more. This is not at all uncommon. As the tested code here is an isolated
piece of code, you will probably leave it alone as soon as it passes the tests and you are
confident in the solution. If at some point, the dot()
would failyou should
add a test that reproduces the error and improve the solution so that it passes the
test.
When constructing software for more complex problems, there will be several interacting components and running the tests after modifying one of the components will help you assure that all components still play well together, and spot problems as soon as possible.
2.2. Improving efficiency
There are times when a just a correct solution to the problem at hand issufficient. If
ET-dot
is meant to compute a few dot products of small arrays, the naive
implementation above will probably be sufficient. However, if it is to be used many
times and for large arrays and the user is impatiently waiting for the answer, or if
your computing resources are scarse, a more efficient implementation is needed.
Especially in scientific computing and high performance computing, where compute
tasks may run for days using hundreds or even thousands of of compute nodes and
resources are to be shared with many researchers, using the resources efficiently is
of utmost importance and efficient implementations are therefore indispensable.
However important efficiency may be, it is nevertheless a good strategy for developing a new piece of code, to start out with a simple, even naive implementation, neglecting efficiency considerations totally, instead focussing on correctness. Python has a reputation of being an extremely productive programming language. Once you have proven the correctness of this first version it can serve as a reference solution to verify the correctness of later more efficient implementations. In addition, the analysis of this version can highlight the sources of inefficiency and help you focus your attention to the parts that really need it.
2.2.1. Timing your code
The simplest way to probe the efficiency of your code is to time it: write a simple script and record how long it takes to execute. Here’s a script that computes the dot product of two long arrays of random numbers.
"""File prof/run1.py"""
import random
from et_dot import dot # the dot method is all we need from et_dot
def random_array(n=1000):
"""Create an array with n random numbers in [0,1[."""
# Below we use a list comprehension (a Python idiom for
# creating a list from an iterable object).
a = [random.random() for i in range(n)]
return a
if __name__=='__main__':
a = random_array()
b = random_array()
print(dot(a, b))
print("-*# done #*-")
Executing this script yields:
> python ./prof/run1.py
247.78383180344838
-*# done #*-
Note
Every run of this script yields a slightly different outcome because we did not fix
random.seed()
. It will, however, typically be around 250. Since the average
outcome of random.random()
is 0.5, so every entry contributes on average
0.5*0.5 = 0.25
and as there are 1000 contributions, that makes on average 250.0.
We are now ready to time our script. There are many ways to achieve this. Here is a
particularly good introduction.
The
et-stopwatch project
takes this a little further. It can be installed in your current Python environment
with pip
:
> python -m pip install et-stopwatch
Requirement already satisfied: et-stopwatch in /Users/etijskens/.pyenv/versions/3.8.5/lib/python3.8/site-packages (1.0.5)
Although pip
is complaining a bit about not being up to date, the installation is
successful.
To time the script above, modify it as below, using the Stopwatch
class
as a context manager:
"""File prof/run1.py"""
import random
from et_dot import dot # the dot method is all we need from et_dot
from et_stopwatch import Stopwatch
def random_array(n=1000):
"""Create an array with n random numbers in [0,1[."""
# Below we use a list comprehension (a Python idiom for
# creating a list from an iterable object).
a = [random.random() for i in range(n)]
return a
if __name__=='__main__':
with Stopwatch(message="init"):
a = random_array()
b = random_array()
with Stopwatch(message="dot "):
a_dot_b = dot(a, b)
print(a_dot_b)
print("-*# done #*-")
and execute it again:
> python ./prof/run1.py
init : 0.000558 s
dot : 0.000182 s
240.2698949846254
-*# done #*-
When the script is executed each with
block will print the time it takes
to execute its body. The first with
block times the initialisation of
the arrays, and the second times the computation of the dot product. Note that the
initialization of the arrays takes a bit longer than the dot product computation.
Computing random numbers is expensive.
2.2.2. Comparison to Numpy
As said earlier, our implementation of the dot product is rather naive. If you want to
become a good programmer, you should understand that you are probably not the first
researcher in need of a dot product implementation. For most linear algebra
problems, Numpy provides very efficient implementations.Below the modified
run1.py
script adds timing results for the Numpy equivalent of our code.
"""File prof/run1.py"""
# ...
import numpy as np
if __name__=='__main__':
with Stopwatch(message="et init"):
a = random_array()
b = random_array()
with Stopwatch(message="et dot "):
dot(a,b)
with Stopwatch(message="np init"):
a = np.random.rand(1000)
b = np.random.rand(1000)
with Stopwatch(message="np dot "):
np.dot(a,b)
print("-*# done #*-")
Its execution yields:
> python ./prof/run1.py
et init : 0.000295 s
et dot : 0.000132 s
np init : 7.3e-05 s
np dot : 9e-06 s
-*# done #*-
Obviously, numpy does significantly better than our naive dot product implementation. It completes the dot product in 7.5% of the time. It is important to understand the reasons for this improvement:
Numpy arrays are contiguous data structures of floating point numbers, unlike Python’s
list
which we have been using for our arrays, so far. In a Pythonlist
object is in fact a pointer that can point to an arbitrary Python object. The items in a Pythonlist
object may even belong to different types. Contiguous memory access is far more efficient. In addition, the memory footprint of a numpy array is significantly lower that that of a plain Python list.The loop over Numpy arrays is implemented in a low-level programming languange, like C, C++ or Fortran. This allows to make full use of the processors hardware features, such as vectorization and fused multiply-add (FMA).
Note
Note that also the initialisation of the arrays with numpy is almost 6 times faster, for roughly the same reasons.
2.2.3. Conclusion
There are three important generic lessons to be learned from this tutorial:
Always start your projects with a simple and straightforward implementation which can be easily be proven to be correct, even if you know that it will not satisfy your efficiency constraints. You should use it as a reference solution to prove the correctness of later more efficient implementations.
Write test code for proving correctness. Tests must be reproducible, and be run after every code extension or modification to ensure that the changes did not break the existing code.
Time your code to understand which parts are time consuming and which not. Optimize bottlenecks first and do not waste time optimizing code that does not contribute significantly to the total runtime. Optimized code is typically harder to read and may become a maintenance issue.
Before you write any code, in this case our dot product implementation, spend some time searching the internet to see what is already available. Especially in the field of scientific and high performance computing there are many excellent libraries available which are hard to beat. Use your precious time for new stuff. Consider adding new features to an existing codebase, rather than starting from scratch. It will improve your programming skills and gain you time, even though initially your progress may seem slower. It might also give your code more visibility, and more users, because you provide them with and extra feature on top of something they are already used to.
3. Binary extension modules
3.1. Introduction - High Performance Python
Suppose for a moment that our dot product implementation et_dot.dot()
we developed in tutorial-2` is way too slow to be practical for the research project
that needs it, and that we did not have access to fast dot product implementations,
such as numpy.dot()
. The major advantage we took from Python is that
coding et_dot.dot()
was extremely easy, and even coding the tests
wasn’t too difficult. In this tutorial you are about to discover that coding a highly
efficient replacement for et_dot.dot()
is not too difficult either.
There are several approaches for this. Here are a number of highly recommended links
covering them:
Two of the approaches discussed in the High Performance Python series involve rewriting your code in Modern Fortran or C++ and generate a shared library that can be imported in Python just as any Python module. This is exactly the approach taken in important HPC Python modules, such as Numpy, pyTorch and pandas.Such shared libraries are called binary extension modules. Constructing binary extension modules is by far the most scalable and flexible of all current acceleration strategies, as these languages are designed to squeeze the maximum of performance out of a CPU.
However, figuring out how to build such binary extension modules is a bit of a
challenge, especially in the case of C++. This is in fact one of the main reasons why
Micc2 was designed: facilitating the construction of binary extension modules and
enabling the developer to create high performance tools with ease. To that end,
Micc2 can provide boilerplate code for binary extensions as well a practical
wrapper for building the binary extension modules, the micc2 build
command.
This command uses CMake to pass the build options to the compiler, while bridging the
gap between C++ and Fortran, on one hand and Python on the other hand using pybind11
and f2py. respectively. This is illustrated in the figure below:
There is a difference in how f2py and pybind11 operate. F2py is an executable that inspects the Fortran source code and creates wrappers for the subprograms it finds. These wrappers are C code, compiled and linked with the compiled Fortran code to build the extension module. Thus, f2py needs a Fortran compiler, as well as a C compiler. The Pybind11 approach is conceptually simpler. Pybind11_is a C++ template library that the programmer uses to express the interface between Python and C++. In fact the introspection is done by the programmer, and there is only one compiler round, using a C++ compiler. This gives the programmer more flexibility and control, but also a bit more work.
3.1.1. Choosing between Fortran and C++ for binary extension modules
Here are a number of arguments that you may wish to take into account for choosing the programming language for your binary extension modules:
Fortran is a simpler language than C++.
It is easier to write efficient code in Fortran than C++.
C++ is a general purpose language (as is Python), whereas Fortran is meant for scientific computing. Consequently, C++ is a much more expressive language.
C++ comes with a huge standard library, providing lots of data structures and algorithms that are hard to match in Fortran. If the standard library is not enough, there are also the highly recommended Boost libraries and many other high quality domain specific libraries. There are also domain specific libraries in Fortran, but their count differs by an order of magnitude at least.
With Pybind11 you can almost expose anything from the C++ side to Python, and vice versa, not just functions.
Modern Fortran is (imho) not as good documented as C++. Useful places to look for language features and idioms are:
Fortran: https://www.fortran90.org/
In short, C++ provides much more possibilities, but it is not for the novice. As to my own experience, I discovered that working on projects of moderate complexity I progressed significantly faster using Fortran rather than C++, despite the fact that my knowledge of Fortran is quite limited compared to C++. However, your mileage may vary.
3.2. Adding Binary extensions to a Micc2 project
Adding a binary extension to your current project is simple. To add a binary extension ‘foo’ written in (Modern) Fortran, run:
> micc add foo --f90
and for a C++ binary extension, run:
> micc add bar --cpp
The add
subcommand adds a component to your project. It specifies a name, here,
foo
, and a flag to specify the kind of the component, --f90
for a Fortran
binary extension module, --cpp
for a C++ binary extension module. Other
components are a Python sub-module with module structure (--module
), or
package structure --package
, and a CLI script (–cli and –clisub).
You can add as many components to your project as you want.
The binary modules are build with the micc2 build
command. :
> micc2 build foo
This builds the Fortran binary extension foo
. To build all binary
extensions at once, just issue micc2 build
.
As Micc2 always creates complete working examples you can build the binary extensions right away and run their tests with pytest
If there are no syntax errors the binary extensions will be built, and you will be able
to import the modules foo
and bar
in your project scripts and
use their subroutines and functions. Because foo
and bar
are
submodules of your micc project, you must import them as:
import my_package.foo
import my_package.bar
# call foofun in my_package.foo
my_package.foo.foofun(...)
# call barfun in my_package.bar
my_package.bar.barfun(...)
3.2.1. Build options
Here is an overview of micc2 build
options:
> micc2 build --help
Usage: micc2 build [OPTIONS] [MODULE]
Build binary extensions.
:param str module: build a binary extension module. If not specified or
all binary extension modules are built.
Options:
-b, --build-type TEXT build type: any of the standard CMake build types:
Release (default), Debug, RelWithDebInfo, MinSizeRel.
--clean Perform a clean build, removes the build directory
before the build, if there is one. Note that this
option is necessary if the extension's
``CMakeLists.txt`` was modified.
--cleanup Cleanup remove the build directory after a successful
build.
--help Show this message and exit.
3.3. Building binary extension modules from Fortran
So, in order to implement a more efficient dot product, let us add a Fortran binary
extension module with name dotf
:
> micc2 add dotf --f90
[INFO] [ Adding f90 submodule dotf to package et_dot.
[INFO] - Fortran source in /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/dotf.f90.
[INFO] - build settings in /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/CMakeLists.txt.
[INFO] - module documentation in /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/dotf.rst (restructuredText format).
[INFO] - Python test code in /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/tests/et_dot/dotf/test_dotf.py.
[INFO] ] done.
The command now runs successfully, and the output tells us where to enter the Fortran
source code, the build settings, the test code and the documentation of the added
module. Everything related to the dotf
sub-module is in subdirectory
ET-dot/et_dot/dotf
. That directory has a f90_
prefix indicating that
it relates to a Fortran binary extension module. As useal, these files contain
already working example code that you an inspect to learn how things work.
Let’s continue our development of a Fortran version of the dot product. Open file
ET-dot/et_dot/dotf/dotf.f90
in your favorite editor or IDE and replace
the existing example code in the Fortran source file with:
function dot(a,b,n)
! Compute the dot product of a and b
implicit none
real*8 :: dot ! return value
!-----------------------------------------------
! Declare function parameters
integer*4 , intent(in) :: n
real*8 , dimension(n), intent(in) :: a,b
!-----------------------------------------------
! Declare local variables
integer*4 :: i
!-----------------------------------------------'
dot = 0.
do i=1,n
dot = dot + a(i) * b(i)
end do
end function dot
The binary extension module can now be built:
> micc2 build dotf
[INFO] [ Building f90 module 'et_dot/dotf':
[DEBUG] [ > cmake -D PYTHON_EXECUTABLE=/Users/etijskens/.pyenv/versions/3.8.5/bin/python ..
[DEBUG] (stdout)
-- The Fortran compiler identification is GNU 11.2.0
-- Checking whether Fortran compiler has -isysroot
-- Checking whether Fortran compiler has -isysroot - yes
-- Checking whether Fortran compiler supports OSX deployment target flag
-- Checking whether Fortran compiler supports OSX deployment target flag - yes
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: /usr/local/bin/gfortran - skipped
-- Checking whether /usr/local/bin/gfortran supports Fortran 90
-- Checking whether /usr/local/bin/gfortran supports Fortran 90 - yes
# Build settings ###################################################################################
CMAKE_Fortran_COMPILER: /usr/local/bin/gfortran
CMAKE_BUILD_TYPE : Release
F2PY_opt : --opt='-O3'
F2PY_arch :
F2PY_f90flags :
F2PY_debug :
F2PY_defines : -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION;-DF2PY_REPORT_ON_ARRAY_COPY=1;-DNDEBUG
F2PY_includes :
F2PY_linkdirs :
F2PY_linklibs :
module name : dotf.cpython-38-darwin.so
module filepath : /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/dotf.cpython-38-darwin.so
source : /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/dotf.f90
python executable : /Users/etijskens/.pyenv/versions/3.8.5/bin/python [version=Python 3.8.5]
f2py executable : /Users/etijskens/.pyenv/versions/3.8.5/bin/f2py [version=2]
####################################################################################################
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build
[DEBUG] ] done.
[DEBUG] [ > make VERBOSE=1
[DEBUG] (stdout)
/usr/local/Cellar/cmake/3.21.2/bin/cmake -S/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf -B/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/local/Cellar/cmake/3.21.2/bin/cmake -E cmake_progress_start /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/CMakeFiles /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build//CMakeFiles/progress.marks
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/Makefile2 all
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/dotf.dir/build.make CMakeFiles/dotf.dir/depend
cd /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build && /usr/local/Cellar/cmake/3.21.2/bin/cmake -E cmake_depends "Unix Makefiles" /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/CMakeFiles/dotf.dir/DependInfo.cmake --color=
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/dotf.dir/build.make CMakeFiles/dotf.dir/build
[100%] Generating dotf.cpython-38-darwin.so
/Users/etijskens/.pyenv/versions/3.8.5/bin/f2py -m dotf -c --f90exec=/usr/local/bin/gfortran /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/dotf.f90 -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -DF2PY_REPORT_ON_ARRAY_COPY=1 -DNDEBUG --opt='-O3' --build-dir /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build
running build
running config_cc
unifing config_cc, config, build_clib, build_ext, build commands --compiler options
running config_fc
unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options
running build_src
build_src
building extension "dotf" sources
f2py options: []
f2py:> /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8/dotfmodule.c
creating /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8
Reading fortran codes...
Reading file '/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/dotf.f90' (format:free)
Post-processing...
Block: dotf
Block: dot
Post-processing (stage 2)...
Building modules...
Building module "dotf"...
Creating wrapper for Fortran function "dot"("dot")...
Constructing wrapper function "dot"...
dot = dot(a,b,[n])
Wrote C/API module "dotf" to file "/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8/dotfmodule.c"
Fortran 77 wrappers are saved to "/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8/dotf-f2pywrappers.f"
adding '/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8/fortranobject.c' to sources.
adding '/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8' to include_dirs.
copying /Users/etijskens/.pyenv/versions/3.8.5/lib/python3.8/site-packages/numpy/f2py/src/fortranobject.c -> /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8
copying /Users/etijskens/.pyenv/versions/3.8.5/lib/python3.8/site-packages/numpy/f2py/src/fortranobject.h -> /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8
adding '/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8/dotf-f2pywrappers.f' to sources.
build_src: building npy-pkg config files
running build_ext
customize UnixCCompiler
customize UnixCCompiler using build_ext
get_default_fcompiler: matching types: '['gnu95', 'nag', 'absoft', 'ibm', 'intel', 'gnu', 'g95', 'pg']'
customize Gnu95FCompiler
Found executable /usr/local/bin/gfortran
Found executable /usr/local/bin/gfortran
customize Gnu95FCompiler
customize Gnu95FCompiler using build_ext
building 'dotf' extension
compiling C sources
C compiler: clang -Wno-unused-result -Wsign-compare -Wunreachable-code -DNDEBUG -g -fwrapv -O3 -Wall -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/usr/include
creating /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users
creating /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens
creating /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens/software
creating /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens/software/dev
creating /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens/software/dev/workspace
creating /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp
creating /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot
creating /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot
creating /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf
creating /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build
creating /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8
compile options: '-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -DF2PY_REPORT_ON_ARRAY_COPY=1 -DNDEBUG -DNPY_DISABLE_OPTIMIZATION=1 -I/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8 -I/Users/etijskens/.pyenv/versions/3.8.5/lib/python3.8/site-packages/numpy/core/include -I/Users/etijskens/.pyenv/versions/3.8.5/include/python3.8 -c'
clang: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8/dotfmodule.c
clang: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8/fortranobject.c
/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8/dotfmodule.c:144:12: warning: unused function 'f2py_size' [-Wunused-function]
static int f2py_size(PyArrayObject* var, ...)
^
1 warning generated.
compiling Fortran sources
Fortran f77 compiler: /usr/local/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -fPIC -O3
Fortran f90 compiler: /usr/local/bin/gfortran -Wall -g -fno-second-underscore -fPIC -O3
Fortran fix compiler: /usr/local/bin/gfortran -Wall -g -ffixed-form -fno-second-underscore -Wall -g -fno-second-underscore -fPIC -O3
compile options: '-DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -DF2PY_REPORT_ON_ARRAY_COPY=1 -DNDEBUG -I/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8 -I/Users/etijskens/.pyenv/versions/3.8.5/lib/python3.8/site-packages/numpy/core/include -I/Users/etijskens/.pyenv/versions/3.8.5/include/python3.8 -c'
gfortran:f90: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/dotf.f90
gfortran:f77: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8/dotf-f2pywrappers.f
/usr/local/bin/gfortran -Wall -g -Wall -g -undefined dynamic_lookup -bundle /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8/dotfmodule.o /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8/fortranobject.o /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/dotf.o /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/src.macosx-10.15-x86_64-3.8/dotf-f2pywrappers.o -L/usr/local/Cellar/gcc/11.2.0/lib/gcc/11/gcc/x86_64-apple-darwin20/11.2.0 -L/usr/local/Cellar/gcc/11.2.0/lib/gcc/11/gcc/x86_64-apple-darwin20/11.2.0/../../.. -L/usr/local/Cellar/gcc/11.2.0/lib/gcc/11/gcc/x86_64-apple-darwin20/11.2.0/../../.. -lgfortran -o ./dotf.cpython-38-darwin.so
ld: warning: dylib (/usr/local/Cellar/gcc/11.2.0/lib/gcc/11/libgfortran.dylib) was built for newer macOS version (11.3) than being linked (10.15)
ld: warning: dylib (/usr/local/Cellar/gcc/11.2.0/lib/gcc/11/libquadmath.dylib) was built for newer macOS version (11.3) than being linked (10.15)
[100%] Built target dotf
/usr/local/Cellar/cmake/3.21.2/bin/cmake -E cmake_progress_start /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/_cmake_build/CMakeFiles 0
[DEBUG] ] done.
[DEBUG] [ > make install
[DEBUG] (stdout)
[100%] Built target dotf
Install the project...
-- Install configuration: "Release"
-- Installing: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf/../dotf.cpython-38-darwin.so
[DEBUG] ] done.
[INFO] ] done.
[INFO] Binary extensions built successfully:
[INFO] - /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotf.cpython-38-darwin.so
The command produces a lot of output, which comes from CMake, f2py, thecompilation of
the Fortran code, and the compilation of the wrappers of the fortran code, which are
written in C.If there are no syntax errors in the Fortran code, the binary extension
module will build successfully, as above and be installed in a the package directory
of our project ET-dot/et_dot
. The full module name is
dotf.cpython-38-darwin.so
. The extension is composed of: the kind of
Python distribution (cpython
), the MAJORminor version string of the Python
version being used (38
as we are running Python 3.8.5), the OS on which we are
working (<module 'os' from
'/Users/etijskens/.pyenv/versions/3.8.5/lib/python3.8/os.py'>
), and an
extension indicating a shared library on this OS (.so
). This file can be imported
in a Python script, by using the filename without the extension, i.e. dotf
. As the
module was built successfully, we can test it. Here is some test code. Enter it in file
ET-dot/tests/test_dotf.py
:
import numpy as np
import et_dot
# create an alias for the dotf binary extension module
f90 = et_dot.dotf
def test_dot_aa():
# create an numpy array of floats:
a = np.array([0,1,2,3,4],dtype=float)
# use the original dot implementation to compute the expected result:
expected = et_dot.dot(a,a)
# call the dot function in the binary extension module with the same arguments:
a_dot_a = f90.dot(a,a)
assert a_dot_a == expected
Then run the test (we only run the test for the dotf module, as we did not touch the
et_dot.dot()
implementation):
> pytest tests/et_dot/dotf/test_dotf.py
============================= test session starts ==============================
platform darwin -- Python 3.8.5, pytest-6.2.2, py-1.11.0, pluggy-0.13.1
rootdir: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot
collected 1 item
tests/et_dot/dotf/test_dotf.py . [100%]
============================== 1 passed in 0.43s ===============================
The astute reader will notice the magic that is happening here: a
is a numpy array,
which is passed as the first and second parameter to the et_dot.dotf.dot()
function defined in our binary extension module. Note that the third parameter of the
et_dot.dotf.dot()
function is omitted. How did that happen? The
micc2 build
command uses f2py to build the binary extension module. When
calling et_dot.dotf.dot()
you are in fact calling a wrapper function that
f2py created that extracts the pointer to the memory of array a
and its length. The
wrapper function then calls the Fortran function with the approprioate parameters
as specified in the Fortran function definition. This invisible wrapper function is
in fact rather intelligent, it even handles type conversions. E.g. we can pass in a
Python array, and the wrapper will convert it into a numpy array, or an array of ints,
and the wrapper will convert it into a float array. In fact the wrapper considers all
implicit type conversions allowed by Python. However practical this feature may be,
type conversion requires copying the entire array and converting each element. For
long arrays this may be prohibitively expensive. For this reason the
et_dot/dotf/CMakeLists.txt
file specifies the
F2PY_REPORT_ON_ARRAY_COPY=1
flag which makes the wrappers issue a warning to
tell you that you should modify the client program to pass types to the wrapper which to
not require conversion.
>>> import et_dot
>>> a = [1,2,3]
>>> b = [2,2,2]
>>> print(et_dot.dot(a,b))
12
>>> print(et_dot.dotf.dot(a,b))
12.0
created an array from object
created an array from object
Here, a
and b
are plain Python lists, not numpy arrays, andthey contain
int
numbers. et_dot.dot()
therefore also returns an int (12
).
However, the Fortran implementation et_dot.dotf.dot()
expects an
array of floats and returns a float (12.0
). The wrapper converts the Python lists
a
and b
to numpy float
arrays. If the binary extension module was
compiled with
F2PY_REPORT_ON_ARRAY_COPY=1
(the default setting) the wrapper will warn you with the message``created an array from object``.
If we construct the numpy arrays ourselves, but still of type int
, the wrapper has
to convert the int
array into a float
array, because that is what corresponds
the the Fortran real*8
type, and will warn that it copied the array to make the
conversion:
>>> import et_dot
>>> import numpy as np
>>> a = np.array([1,2,3])
>>> b = np.array([2,2,2])
>>> print(et_dot.dot(a,b))
12
>>> print(et_dot.dotf.dot(a,b))
12.0
copied an array: size=3, elsize=8
copied an array: size=3, elsize=8
Here, size
refers to the length of the array, and elsize is thenumber of bytes
needed for each element of the target array type, c.q. a float
.
Note
The wrappers themselves are generated in C code, so, you not only need a Fortran compiler, but also a C compiler.
Note that the test code did not explicitly import et_dot.dotf
, just
et_dot
. This is only possible because Micc2 has modified
et_dot/__init__.py
to import every submodule that has been added to the
project:
# in file et_dot/__init__.py
import et_dot.dotf
If the submodule et_dot.dotf
was not built or failed to build, that import
statement will fail and raise a ModuleNotFoundError
exception. Micc2
has added a little extra magic to attempt to build the module automatically in that
case:
# in file et_dot/__init__.py
try:
import et_dot.dotf
except ModuleNotFoundError as e:
# Try to build this binary extension:
from pathlib import Path
import click
from et_micc2.project import auto_build_binary_extension
msg = auto_build_binary_extension(Path(__file__).parent, "dotf")
if not msg:
import et_dot.dotf
else:
click.secho(msg, fg="bright_red")
Obviously, you should also add the other tests we created for the Python implementation.
3.3.1. Dealing with Fortran modules
Modern Fortran has a module concept of its own. This may be a bit confusing, as we have been talking about modules in a Python context, so far. The Fortran module is meant to group variable and procedure definitions that conceptually belong together. Inside fortran they are comparable to C/C++ header files. Here is an example:
MODULE my_f90_module
implicit none
contains
function dot(a,b,n)
! Compute the dot product of a and b
implicit none
!
!-----------------------------------------------
integer*4 , intent(in) :: n
real*8 , dimension(n), intent(in) :: a,b
real*8 :: dot
! declare local variables
integer*4 :: i
!-----------------------------------------------
dot = 0.
do i=1,n
dot = dot + a(i) * b(i)
end do
end function dot
END MODULE my_f90_module
F2py translates the module containing the Fortran dot
definition into an extra
namespace appearing in between the dotf
Python submodule and the
dot()
function, which is found in et_dot.dotf.my_f90_module
instead of in et_dot.dotf
.
>>> import numpy as np
>>> import et_dot
>>> a = np.array([1.,2.,3.])
>>> b = np.array([2.,2.,2.])
>>> print(et_dot.dotf.my_f90_module.dot(a,b))
12.0
>>> # If typing this much annoys you, you can create an alias to the `Fortran module`:
>>> f90 = et_dot.dotf.my_f90_module
>>> print(f90.dot(a,b))
12.0
This time there is no warning from the wrapper as a
and b
are numpy arrays of
type float
, which correspond to Fortran’s real*8
, so no conversion is
needed.
3.3.2. Controlling the build
The build parameters for our Fortran binary extension module are detailed in the file
et_dot/dotf/CMakeLists.txt
. This is a rather lengthy file, but most of it
is boilerplate code which you should not need to touch. The boilerplate sections are
clearly marked. By default this file specifies that a release version is to be built.
The file documents a set of CMake variables that can be used to control the build type:
CMAKE_BUILD_TYPE : DEBUG | MINSIZEREL | RELEASE* | RELWITHDEBINFO
F2PY_noopt : turn off optimization options
F2PY_noarch : turn off architecture specific optimization options
F2PY_f90flags : additional compiler options
F2PY_arch : architecture specific optimization options
F2PY_opt : optimization options
In addition you can specify:
preprocessor macro definitions
include directories
link directories
link libraries
Here are the sections of CMakeLists.txt
to control the build. Uncomment the
relevant lines and modify them to your needs.
... # (boilerplate code omitted for clarity)
# Set the build type:
# - If you do not specify a build type, it is Release by default.
# - Note that the DEBUG build type will trigger f2py's '--noopt --noarch --debug' options.
# set(CMAKE_BUILD_TYPE Debug | MinSizeRel | Release | RelWithHDebInfo)
... # (boilerplate code omitted for clarity)
####################################################################################################
######################################################################### Customization section ####
# Specify compiler options #########################################################################
# Uncomment to turn off optimization:
# set(F2PY_noopt 1)
# Uncomment to turn off architecture specific optimization:
# set(F2PY_noarch 1)
# Set additional f90 compiler flags:
# set(F2PY_f90flags your_flags_here)
# set(F2PY_f90flags -cpp) # enable the C preprocessor (preprocessor directives must appear on the
# on the first column of the line).
# Set architecture specific optimization compiler flags:
# set(F2PY_arch your_flags_here)
# Overwrite optimization flags
# set(F2PY_opt your_flags_here)
# Add preprocessor macro definitions ###############################################################
# add_compile_definitions(
# OPENFOAM=1912 # set value
# WM_LABEL_SIZE=$ENV{WM_LABEL_SIZE} # set value from environment variable
# WM_DP # just define the macro
# )
# Add include directories ##########################################################################
# include_directories(
# path/to/dir1
# path/to/dir2
# )
# Add link directories #############################################################################
# link_directories(
# path/to/dir1
# )
# Add link libraries (lib1 -> liblib1.so) ##########################################################
# link_libraries(
# lib1
# lib2
# )
####################################################################################################
... # (boilerplate code omitted for clarity)
3.4. Building binary extensions from C++
To illustrate building binary extension modules from C++ code, let us also create a
C++ implementation for the dot product. Analogously to our dotf
module we
will call the C++ module dotc
, where the c
refers to C++, naturally.
Use the micc2 add
command to add a cpp module:
> micc2 add dotc --cpp
[INFO] [ Adding cpp submodule dotc to package et_dot.
[INFO] - C++ source in /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/dotc.cpp.
[INFO] - build settings in /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/CMakeLists.txt.
[INFO] - module documentation in /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/dotc.rst (restructuredText format).
[INFO] - Python test code in /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/tests/et_dot/dotc/test_dotc.py.
[INFO] ] done.
As before, the output tells us where we need to add the details of the component we added to our project.
Numpy does not have an equivalent of F2py to create wrappers for C++ code. Instead,
Micc2 uses Pybind11 to generate the wrappers. For an excellent overview of this
topic, check out
Python & C++, the beauty and the beast, dancing together.
Pybind11 has a lot of ‘automagical’ features, and the fact that it is a header-only
C++ library makes its use much simpler than, e.g.,
Boost.Python,
which offers very similar features, but is not header-only and additionally depends
on the python version you want to use. Consequently, you need a build a
Boost.Python
library for every Python version you want to use.
Enter this code in the C++ source file ET-dot/et_dot/dotc/dotc.cpp
. (you
may also remove the example code in that file.)
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
double
dot ( pybind11::array_t<double> a
, pybind11::array_t<double> b
)
{
// requeest acces to the memory of the Numpy array objects a and b
auto bufa = a.request()
, bufb = b.request()
;
// verify dimensions and shape:
if( bufa.ndim != 1 || bufb.ndim != 1 ) {
throw std::runtime_error("Number of dimensions must be one");
}
if( (bufa.shape[0] != bufb.shape[0]) ) {
throw std::runtime_error("Input shapes must match");
}
// provide access to raw memory
// because the Numpy arrays are mutable by default, py::array_t is mutable too.
// Below we declare the raw C++ arrays for x and y as const to make their intent clear.
double const *ptra = static_cast<double const *>(bufa.ptr);
double const *ptrb = static_cast<double const *>(bufb.ptr);
// compute the dot product and return the result:
double d = 0.0;
for (size_t i = 0; i < bufa.shape[0]; i++)
d += ptra[i] * ptrb[i];
return d;
}
// describe what goes in the module
PYBIND11_MODULE(dotc, m) // `m` is a variable holding the module definition
// `dotc` is the module's name
{// A module doc-string (optional):
m.doc() = "C++ binary extension module `dotc`";
// List the functions you want to expose:
// m.def("exposed_name", function_pointer, "doc-string for the exposed function");
m.def("dot", &dot, "Compute the dot product of two arrays.");
}
Obviously the C++ source code is more involved than its Fortran equivalent in the previous section. This is because f2py is a program performing clever introspection into the Fortran source code, whereas pybind11 is just a C++ template library and as such it needs a little help from the user. This is, however, compensated by the flexibility of Pybind11.
We can now build the module:
micc2 build dotc
[INFO] [ Building cpp module 'et_dot/dotc':
[DEBUG] [ > cmake -D PYTHON_EXECUTABLE=/Users/etijskens/.pyenv/versions/3.8.5/bin/python -D pybind11_DIR=/Users/etijskens/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pybind11/share/cmake/pybind11 ..
[DEBUG] (stdout)
-- The CXX compiler identification is AppleClang 12.0.5.12050022
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
pybind11_DIR : /Users/etijskens/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pybind11/share/cmake/pybind11
-- Found PythonInterp: /Users/etijskens/.pyenv/versions/3.8.5/bin/python (found version "3.8.5")
-- Found PythonLibs: /Users/etijskens/.pyenv/versions/3.8.5/lib/libpython3.8.a
-- Performing Test HAS_FLTO
-- Performing Test HAS_FLTO - Success
-- Performing Test HAS_FLTO_THIN
-- Performing Test HAS_FLTO_THIN - Success
-- Found pybind11: /Users/etijskens/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pybind11/include (found version "2.6.2" )
-- Configuring done
-- Generating done
-- Build files have been written to: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/_cmake_build
[DEBUG] ] done.
[DEBUG] [ > make VERBOSE=1
[DEBUG] (stdout)
/usr/local/Cellar/cmake/3.21.2/bin/cmake -S/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc -B/Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/_cmake_build --check-build-system CMakeFiles/Makefile.cmake 0
/usr/local/Cellar/cmake/3.21.2/bin/cmake -E cmake_progress_start /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/_cmake_build/CMakeFiles /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/_cmake_build//CMakeFiles/progress.marks
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/Makefile2 all
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/dotc.dir/build.make CMakeFiles/dotc.dir/depend
cd /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/_cmake_build && /usr/local/Cellar/cmake/3.21.2/bin/cmake -E cmake_depends "Unix Makefiles" /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/_cmake_build /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/_cmake_build /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/_cmake_build/CMakeFiles/dotc.dir/DependInfo.cmake --color=
/Library/Developer/CommandLineTools/usr/bin/make -f CMakeFiles/dotc.dir/build.make CMakeFiles/dotc.dir/build
[ 50%] Building CXX object CMakeFiles/dotc.dir/dotc.cpp.o
/Library/Developer/CommandLineTools/usr/bin/c++ -Ddotc_EXPORTS -isystem /Users/etijskens/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pybind11/include -isystem /Users/etijskens/.pyenv/versions/3.8.5/include/python3.8 -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk -fPIC -fvisibility=hidden -flto -std=gnu++11 -MD -MT CMakeFiles/dotc.dir/dotc.cpp.o -MF CMakeFiles/dotc.dir/dotc.cpp.o.d -o CMakeFiles/dotc.dir/dotc.cpp.o -c /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/dotc.cpp
[100%] Linking CXX shared module dotc.cpython-38-darwin.so
/usr/local/Cellar/cmake/3.21.2/bin/cmake -E cmake_link_script CMakeFiles/dotc.dir/link.txt --verbose=1
/Library/Developer/CommandLineTools/usr/bin/c++ -isysroot /Library/Developer/CommandLineTools/SDKs/MacOSX11.3.sdk -bundle -Wl,-headerpad_max_install_names -Xlinker -undefined -Xlinker dynamic_lookup -flto -o dotc.cpython-38-darwin.so CMakeFiles/dotc.dir/dotc.cpp.o
[100%] Built target dotc
/usr/local/Cellar/cmake/3.21.2/bin/cmake -E cmake_progress_start /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/_cmake_build/CMakeFiles 0
[DEBUG] ] done.
[DEBUG] [ > make install
[DEBUG] (stdout)
Consolidate compiler generated dependencies of target dotc
[100%] Built target dotc
Install the project...
-- Install configuration: ""
-- Installing: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc/../dotc.cpython-38-darwin.so
[DEBUG] ] done.
[INFO] ] done.
[INFO] Binary extensions built successfully:
[INFO] - /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/dotc.cpython-38-darwin.so
['/Users/etijskens/.pyenv/versions/3.8.5/lib/python3.8/site-packages', '/Users/etijskens/.local/lib/python3.8/site-packages']
path_to_cmake_tools=/Users/etijskens/.pyenv/versions/3.8.5/lib/python3.8/site-packages/pybind11/share/cmake/pybind11
The build
command produces quit a bit of output, though typically less that for a
Fortran binary extension module. If the source file does not have any syntax errors,
and the build did not experience any problems, the package directory et_dot
will contain a binary extension module dotc.cpython-38-darwin.so
, along
with the previously built dotf.cpython-38-darwin.so
.
Here is some test code. It is almost exactly the same as that for the f90 module
dotf
, except for the module name. Enter the test code in
ET-dot/tests/et_dot/dotc/test_dotc.py
:
import numpy as np
import et_dot
# create alias to dotc binary extension module:
cpp = et_dot.dotc
def test_dotc_aa():
a = np.array([0, 1, 2, 3, 4], dtype=float)
expected = np.dot(a, a)
# call function dotc in the binary extension module:
a_dot_a = cpp.dot(a, a)
assert a_dot_a == expected
The test passes successfully. Obviously, you should also add the other tests we created for the Python implementation.
pytest tests/et_dot/dotc/test_dotc.py
============================= test session starts ==============================
platform darwin -- Python 3.8.5, pytest-6.2.2, py-1.11.0, pluggy-0.13.1
rootdir: /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot
collected 1 item
tests/et_dot/dotc/test_dotc.py . [100%]
============================== 1 passed in 0.32s ===============================
Note
The Pybind11 wrappers automatically apply the same conversions as the F2py
wrappers. Here is an example where the input arrays are a plain Python list
containing int
values. The wrapper converts them on the fly into a contiguous
array of float``valuwa (which correspond to C++'s ``double
) and returns a
float
:
>>> import et_dot
>>> print(et_dot.dotc.dot([1,2],[3,4]))
11.0
This time, however, there is no warning that the wrapper converted or copied. As converting and copying of large is time consuming, this may incur a non-negligable cost on your application, Moreover, if the arrays are overwritten in the C++ code and serve for output, the result will not be copied back, and will be lost. This will result in a bug in the client code, as it will continue its execution with the original values.
3.3.1. Controlling the build
The build parameters for our C++ binary extension module are detailed in the file
et_dot/cpp_dotc/CMakeLists.txt
, just as in the f90 case. It contains
significantly less boilerplate code (which you should not need to touch) and
provides the same functionality. Here is the section of
et_dot/cpp_dotc/CMakeLists.txt
that you might want to adjust to your
needs:
... # (boilerplate code omitted for clarity)
####################################################################################################
######################################################################### Customization section ####
# set compiler:
# set(CMAKE_CXX_COMPILER path/to/executable)
# Set build type:
# set(CMAKE_BUILD_TYPE DEBUG | MINSIZEREL | RELEASE | RELWITHDEBINFO)
# Add compiler options:
# set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} <additional C++ compiler options>")
# Request a specific C++ standard:
# set(CMAKE_CXX_STANDARD 17)
# Add preprocessor macro definitions:
# add_compile_definitions(
# OPENFOAM=1912 # set value
# WM_LABEL_SIZE=$ENV{WM_LABEL_SIZE} # set value from environment variable
# WM_DP # just define the macro
# )
# Add include directories
#include_directories(
# path/to/dir1
# path/to/dir2
# )
# Add link directories
# link_directories(
# path/to/dir1
# )
# Add link libraries (lib1 -> liblib1.so)
# link_libraries(
# lib1
# lib2
# )
####################################################################################################
... # (boilerplate code omitted for clarity)
3.5. Data type issues
When interfacing several programming languages data types require special care. We already noted that although conversions are automatic if possible, they may be costly. It is always more computationally efficient that the data types on both sides (Python and respectively Fortran or C++) correspond. Here is a table with the most relevant numeric data types in Python, Fortran and C++.
data type
Numpy(np)/Python
Fortran
C++
unsigned integer
np.uint32
N/A
signed long int
unsigned integer
np.uint64
N/A
signed long long int
signed integer
np.int32, int
integer*4
signed long int
signed integer
np.int64
integer*8
signed long long int
floating point
np.float32, np,single
real*4
float
floating point
np.float64, np.double, float
real*8
double
complex
np.complex64
complex*4
std::complex<float>
complex
np.complex128
complex*8
std::complex<double>
If there is automatic conversion between two data types in Python, e.g. from
float32
to float64
the wrappers around our function will perform the
conversion automatically if needed. This happens both for Fortran and C++. However,
this comes with the cost of copying and converting, which is sometimes not
acceptable.
The result of a Fortran function and a C++ function in a binary extension module is always copied back to the Python variable that will hold it. As copying large data structures is detrimental to performance this shoud be avoided. The solution to this problem is to write Fortran functions or subroutines and C++ functions that accept the result variable as an argument and modify it in place, so that the copy operaton is avoided. Consider this example of a Fortran subroutine that computes the sum of two arrays.
subroutine add(a,b,sum,n)
! Compute the sum of arrays a and b and overwrite
! array sum with the result
implicit none
!-------------------------------------------------
! Declare arguments
integer*4 , intent(in) :: n
real*8 , dimension(n), intent(in) :: a,b
real*8 , dimension(n), intent(inout) :: sum
!-------------------------------------------------
! Declare local variables
integer*4 :: i
!-------------------------------------------------
! Compute the sum
do i=1,n
sum(i) = a(i) + b(i)
end do
end subroutine add
The crucial issue here is that the result array sumab
is qualified as
intent(inout)
, meaning that the add
function has both read and write access
to it. This function would be called in Python like this:
Let us add this method to our dotf
binary extension module, just to
demonstrate its use.
>>> import numpy as np
>>> import et_dot
>>> a = np.array([1.,2.])
>>> b = np.array([3.,4.])
>>> sum = np.empty(len(a),dtype=float)
>>> et_dot.dotf.add(a,b, sum)
>>> print(sum)
[4. 6.]
If add
would have been qualified as as intent(in)
, as the input parameters
a
and b
, add
would not be able to modify the sum
array. On the other
hand, and rather surprisingly, qualifying it with intent(out)
forces f2py to
consider the variable as a left hand side variable and define a wrapper that in Python
would be called like this:
sum = et_dot.dotf.add(a,b)
This obviously implies copying the contents of the result array to the Python
variable sum
, which, as said, may be prohibitively expensive.
So, the general advice is: use functions to return only variables of small size, like a
single number, or a tuple, maybe even a small fixed size array, but certainly not a
large array. If you have result variables of large size, compute them in place in
parameters with intent(inout)
. If there is no useful small variable to return,
use a subroutine instead of a function. Sometimes it is useful to have functions
return an error code, or the CPU time the computation used, while the result of the
computation is computed in a parameter with intent(inout)
, as below:
function add(a,b,sum,n)
! Compute the sum of arrays a and b and overwrite array sumab with the result
! Return the CPU time consumed in seconds.
implicit none
real*8 add ! return value
!-------------------------------------------------
! Declare arguments
integer*4 , intent(in) :: n
real*8 , dimension(n), intent(in) :: a,b
real*8 , dimension(n), intent(inout) :: sum
!-------------------------------------------------
! declare local variables
integer*4 :: i
real*8 :: start, finish
!-------------------------------------------------
! Compute the result
call cpu_time(start)
do i=1,n
sum(i) = a(i) + b(i)
end do
call cpu_time(finish)
add = finish-start
end function add
Note that Python does not require you to store the return value of a function. The above
add
function might be called as:
>>> import numpy as np
>>> import et_dot
>>> a = np.array([1.,2.])
>>> b = np.array([3.,4.])
>>> sum = np.empty(len(a),dtype=float)
>>> cputime = et_dot.dotf.add(a,b, sum)
>>> print(cputime)
5.3999999999998494e-05
>>> print(sum)
[4. 6.]
Computing large arrays in place can be accomplished in C++ quite similarly. As Python
does not have a concept of const
parameters, all parameters are writable by
default. However, when casting the memory of the arrays to pointers, we take care to
cast to double *
or double const *
depending on the intended use ofthe arrays,
in order to prevent errors.
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
namespace py = pybind11;
void
add ( py::array_t<double> a
, py::array_t<double> b
, py::array_t<double> sum
)
{// request buffer description of the arguments
auto buf_a = a.request()
, buf_b = b.request()
, buf_sum = sum.request()
;
if( buf_a.ndim != 1 || buf_b.ndim != 1 || buf_sum.ndim != 1 )
throw std::runtime_error("Number of dimensions must be one");
if( (buf_a.shape[0] != buf_b.shape[0]) || (buf_a.shape[0] != buf_sum.shape[0]) )
throw std::runtime_error("Input shapes must match");
// because the Numpy arrays are mutable by default, py::array_t is mutable too.
// Below we declare the raw C++ arrays for a and b as const to make their intent clear.
double const *ptr_a = static_cast<double const *>(buf_a.ptr);
double const *ptr_b = static_cast<double const *>(buf_b.ptr);
double *ptr_sum = static_cast<double *>(buf_sum.ptr);
for (size_t i = 0; i < buf_a.shape[0]; i++)
ptr_sum[i] = ptr_a[i] + ptr_b[i];
}
PYBIND11_MODULE(dotc, m)
{ m.doc() = "dotc binary extension module"; // optional module docstring
m.def("add", &add, "compute the sum of two arrays.");
}
3.6. Documenting binary extension modules
For Python modules the documentation is automatically extracted from the
doc-strings in the module. However, when it comes to documenting binary extension
modules, this does not seem a good option. Ideally, the source files
ET-dot/et_dot/dotf/dotf.f90
and
ET-dot/et_dot/cpp_dotc/dotc.cpp
should document the Fortran functions
and subroutines, and C++ functions, respectively, rather than the Python
interface. Yet, from the perspective of ET-dot being a Python project, the user is
only interested in the documentation of the Python interface to those functions and
subroutines. Therefore, Micc2 requires you to document the Python interface in
separate .rst
files:
ET-dot/et_dot/dotf/dotf.rst
ET-dot/et_dot/cpp_dotc/dotc.rst
their contents could look like this: for ET-dot/et_dot/dotf/dotf.rst
:
Module et_dot.dotf
******************
Module (binary extension) :py:mod:`dotf`, built from fortran source.
.. function:: dot(a,b)
:module: et_dot.dotf
Compute the dot product of ``a`` and ``b``.
:param a: 1D Numpy array with ``dtype=float``
:param b: 1D Numpy array with ``dtype=float``
:returns: the dot product of ``a`` and ``b``
:rtype: ``float``
and for ET-dot/et_dot/cpp_dotc/dotc.rst
:
Module et_dot.dotc
******************
Module (binary extension) :py:mod:`dotc`, built from C++ source.
.. function:: dot(a,b)
:module: et_dot.dotc
Compute the dot product of ``a`` and ``b``.
:param a: 1D Numpy array with ``dtype=float``
:param b: 1D Numpy array with ``dtype=float``
:returns: the dot product of ``a`` and ``b``
:rtype: ``float``
The (html) documentation is build as always:
micc2 doc
[INFO] [ > make html
[INFO] (stdout)
Running Sphinx v3.5.3
making output directory... done
WARNING: html_static_path entry '_static' does not exist
building [mo]: targets for 0 po files that are out of date
building [html]: targets for 7 source files that are out of date
updating environment: [new config] 7 added, 0 changed, 0 removed
reading sources... [ 14%] api
reading sources... [ 28%] apps
reading sources... [ 42%] authors
reading sources... [ 57%] changelog
reading sources... [ 71%] index
reading sources... [ 85%] installation
reading sources... [100%] readme
looking for now-outdated files... none found
pickling environment... done
checking consistency... /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/docs/apps.rst: WARNING: document isn't included in any toctree
done
preparing documents... done
writing output... [ 14%] api
writing output... [ 28%] apps
writing output... [ 42%] authors
writing output... [ 57%] changelog
writing output... [ 71%] index
writing output... [ 85%] installation
writing output... [100%] readme
generating indices... genindex py-modindex done
highlighting module code... [100%] et_dot
writing additional pages... search done
copying static files... done
copying extra files... done
dumping search index in English (code: en)... done
dumping object inventory... done
build succeeded, 2 warnings.
The HTML pages are in _build/html.
[INFO] ] done.
As the output shows, the documentation is found in your project directory in
docs/_build/html/index.html
. It can be opened in your favorite browser.
function dot(a,b,n)
! Compute the dot product of a and b
implicit none
real*8 :: dot ! return value
!-----------------------------------------------
! Declare function parameters
integer*4 , intent(in) :: n
real*8 , dimension(n), intent(in) :: a,b
!-----------------------------------------------
! Declare local variables
integer*4 :: i
!-----------------------------------------------'
dot = 0.
do i=1,n
dot = dot + a(i) * b(i)
end do
end function dot
#include <pybind11/pybind11.h>
#include <pybind11/numpy.h>
double
dot ( pybind11::array_t<double> a
, pybind11::array_t<double> b
)
{
// requeest acces to the memory of the Numpy array objects a and b
auto bufa = a.request()
, bufb = b.request()
;
// verify dimensions and shape:
if( bufa.ndim != 1 || bufb.ndim != 1 ) {
throw std::runtime_error("Number of dimensions must be one");
}
if( (bufa.shape[0] != bufb.shape[0]) ) {
throw std::runtime_error("Input shapes must match");
}
// provide access to raw memory
// because the Numpy arrays are mutable by default, py::array_t is mutable too.
// Below we declare the raw C++ arrays for x and y as const to make their intent clear.
double const *ptra = static_cast<double const *>(bufa.ptr);
double const *ptrb = static_cast<double const *>(bufb.ptr);
// compute the dot product and return the result:
double d = 0.0;
for (size_t i = 0; i < bufa.shape[0]; i++)
d += ptra[i] * ptrb[i];
return d;
}
// describe what goes in the module
PYBIND11_MODULE(dotc, m) // `m` is a variable holding the module definition
// `dotc` is the module's name
{// A module doc-string (optional):
m.doc() = "C++ binary extension module `dotc`";
// List the functions you want to expose:
// m.def("exposed_name", function_pointer, "doc-string for the exposed function");
m.def("dot", &dot, "Compute the dot product of two arrays.");
}
4. Adding Python submodules
Adding binary extension (sub)modules is important for adding implementations in
Fortran or C++ for performance reasons. For larger projects it is sometimes
practical to be able to organize your Python code in different files, e.g. one file for
each Python class. Micc2 allows your to add Python submodules to your project. Just
as the default top level package, these Python submodules have a package structure
too. This command adds a module foo.py
to your project:
micc2 add foo --py
[INFO] [ Adding python submodule foo to package et_dot.
[INFO] - python source in /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/foo/__init__.py.
[INFO] - Python test code in /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/tests/et_dot/foo/test_foo.py.
[INFO] ] done.
We can add sub-submodules too. E.g to add a bar
sub-module to the foo
sub-module:
micc2 add foo/bar --py
[INFO] [ Adding python submodule foo/bar to package et_dot.
[INFO] - python source in /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/et_dot/foo/bar/__init__.py.
[INFO] - Python test code in /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/ET-dot/tests/et_dot/foo/bar/test_bar.py.
[INFO] ] done.
In fact this sub-sub-module bar
can even be a C++ of Fortran binary extension
module. One only needs to replace the --py
flagwith --cpp
or --f90
. The
binary extension modules themselves, however, cannot contain submodules.
As usual, Micc2 added example code and example test code for the added components. as well as documentation entries for the submodule.
4.1. Adding a Python Command Line Interface
Command Line Interfaces are Python scripts in a Python package that are
installed as executable programs when the package is installed. E.g. Micc2 is a CLI.
Installing package et-micc2
installs the micc2
as an executable
program. CLIs come in two flavors, single command CLIs and CLIs with subcommands.
Single command CLIs perform a single task, which can be modified by optional
parameters and flags. CLIs with subcommands can performs different, usually
related, tasks by selecting an appropriate subcommand. Git and Micc2 are CLIs with
subcommands. You can add a single command CLI named myapp
to your project with the
command:
> micc2 add myapp --cli
and
> micc2 add myapp --clisub
for a CLI with subcommands. Micc2 adds the necessary files, containing working
example code and tests, as well as a documentation entry in APPS.rst
. The
documentation will be extracted automatically from doc-strings and help-strings
(these are explained below).
4.1.1. CLI example
Assume that we need quite often to read two arrays from file and compute their dot product, and that we want to execute this operation as:
> dotfiles file1 file2
dot(file1,file2) = 123.456
The second line is the output that we expect.
dotfiles
is, obviously a single command CLI, so we add a CLI component with
the --cli
flag:
micc2 add dotfiles --cli
[INFO] [ Adding CLI dotfiles to project ET-dot
(single command CLI).
[INFO] - Python source file ET-dot/et_dot/cli_dotfiles.py.
[INFO] - Python test code ET-dot/tests/test_cli_dotfiles.py.
[WARNING] Dependencies added:
If you are using a virtual environment created with poetry, run:
`poetry install` or `poetry update` to install missing dependencies.
If you are using a virtual environment not created with poetry, run:
(.venv) > pip install click
Otherwise, run:
> pip install click --user
[INFO] ] done.
As usual Micc2 tells us where to add the source code for the CLI, and where to add the test
code for it. Furthermore, Micc2 expects us to use the Click package for
implementing the CLI, a very practical and flexible package which is well
documented. The example code in et_dot/cli_dotfiles.py
is already based
on Click, and contains an example of a single command CLI or a Cli with subcommands,m
depending on the flag you used. Here is the proposed implementation of our
dotfiles
CLI:
# -*- coding: utf-8 -*-
"""Command line interface dotfiles (no sub-commands)."""
import sys
import click
import numpy as np
import et_dot
@click.command()
@click.argument('file1')
@click.argument('file2')
@click.option('-v', '--verbosity', count=True
, help="The verbosity of the CLI."
, default=0
)
def main(file1, file2, verbosity):
"""Command line interface dot-files, computes the dot product of two arrays
in files ``file1`` and ``file2`` and prints the result.
file format is text, comma delimited
:param str file1: location of file containing first array
:param str file2: location of file containing first array
"""
# Read the arrays from file, assuming comma delimited
a = np.genfromtxt(file1, dtype=np.float64, delimiter=',')
b = np.genfromtxt(file2, dtype=np.float64, delimiter=',')
# Sanity check:
if len(a) != len(b): raise ValueError
# Using the C++ dot product implementation:
ab = et_dot.dotc.dot(a,b)
if verbosity:
if verbosity>1:
print(f"a <- {file1}")
print(f"b <- {file2}")
print(f"dotfiles({a},{b}) = {ab}")
else:
print(ab)
return 0 # return code
if __name__ == "__main__":
sys.exit(main())
Click uses decorators to add arguments and options to turn a method, here
main()
in to the command. Understanding decorators is not really
necessary, but if you are intrigued, check out
Primer on Python decorators.
Otherwise, just follow the Click documentation for how to use the Click decorators
to create nice CLIs.
1,2,3
4,5,6
Click provides a lot of practical features, such as an automatic help function which
is built from the doc-string of the command method, and the help
parameters of the
options. Sphinx_click does the same to extract documentation for your CLI.
> python et_dot/cli_dotfiles.py --help
Usage: cli_dotfiles.py [OPTIONS] FILE1 FILE2
Command line interface dot-files, computes the dot product of two arrays
in files ``file1`` and ``file2`` and prints the result.
file format is text, comma delimited
:param str file1: location of file containing first array :param str
file2: location of file containing first array
Options:
-v, --verbosity The verbosity of the CLI.
--help Show this message and exit.
> python et_dot/cli_dotfiles.py tests/array1.txt tests/array2.txt
32.0
> python et_dot/cli_dotfiles.py tests/array1.txt tests/array2.txt -v
dotfiles([1. 2. 3.],[4. 5. 6.]) = 32.0
> python et_dot/cli_dotfiles.py tests/array1.txt tests/array2.txt -vv
a <- tests/array1.txt
b <- tests/array2.txt
dotfiles([1. 2. 3.],[4. 5. 6.]) = 32.0
Here, we did not exactly call the CLI as dotfiles
, but that is because the package
is not yet installed. The installed executable dotfiles
would just wrap the
command as python path/to/et_dot/cli_dotfiles.py
. Note, that the verbosity
parameter is using a nice Click feature: by adding more v
s the verbosity
increases.
5. Version control and version management
5.1. Version control with Git
Version control systems (VCS) keep track of modifications to the code in a special
kind of database, called a repository.
This article
explains why version control is important. It is especially important when several
developers work on the same team, but even for one-person development teams, it
brings many advantages. It serves as a backup of the code at different points in time.
If something goes wrong you can go back in time, and compare the version that was
working with the current version and investigate the cause of trouble. For small
projects, the backup is probably the most useful. The local repository of your
project, located on your own hard disk, is often accompanied by a remote repository,
located somewhere in the cloud, e.g. at GitHub. Then there is a double backup. If your
hard disk crashes, you can recover everything up to the last commit. A remote
repository can also serve as a way to share your code with other people. For larger
projects branching allows you to work on a new feature A of your code without
disturbing the last release (the main
branch). If, at some point, another
feature B seems more urgent, you leave the A branch aside, and start off a new branch for
the B feature from the main
branch. Later, you can resume the work on the A branch.
Finished branches can be merged with the the main
branch, or even a feature
branch. Other typical branches are bug fix branches. Using branches to isolate work
from the main branch, becomes very useful as soon as your code has users. The branches
isolate the users from the ongoing modifications in your bug fix branches and feature
branches.
5.2. Git support from Micc2
Micc2 prepares your projects for the Git version control system. If you are New to Git, we recommend reading Introduction to Git In 16 Minutes. This article provides a concise introduction to get you going, and some pointers to more detailed documentation.
For full git support, Micc2 must be setup as explained in Installation. When
Micc2 creates a new project, it automatically sets up a local Git repository and
commits the the created project tree with the message ‘And so this begun…’. If you do
not want to use this local repository, just delete the file .gitignore
and
directory .git
. Alternatively, so create project with no git support at all
specify micc2 create <project_name> --no-git
. Micc2 can also create a remote
repository for the project at GitHub. By default this remote repository is public,
following the spirit of open source development. You can ask for a private repository
by specifying --remote=private
, or for no remote repository at all by
specifying --remote=none
. If a remote repository is created, the commit ‘And so
this begun…’ is immediately pushed to the remote repository. For working with
remote Git repositories see
Working with remotes,
especially checkout the sections ‘Pushing to Your Remotes’ and ‘Fetching and
Pulling from Your Remotes’.
5.3. Git workflow
Some advice for beginners on how to use git with your micc project may be appropriate.
Use the command
git status
to see which project files are modified, and which files are new, i.e. are not yet tracked by git. For new files or directories, you must decide whether you want the file or directory to be tracked by git, or not. If the answer is ‘yes’, tell git to track the file or directory with the commandgit add <file-or-directory>
. Otherwise, add the file the.gitignore
file in the project directory:echo <file-or-directory> >> .gitignore
(you can also do this withe an editor). Temporary directories, like_cmake_build
for building binary extensions, or_build
for building documentation are automatically added to the.gitignore
file.Whenever a piece of work is finished and shows no obvious errors, like syntax errors, and passes all the tests, commit the finished work with
git commit -m <message>
, where<message>
describes the piece of work that has been finished. This command puts all changes since the last commit in the local repository. New files that haven’t been added remain untracked. You can commit all untracked files as well by adding the-a
flag:git commit -a -m <message>
. This first adds all untracked files, as ingit add .
and than commits. Since, this piece of work is considered finished, it is wise to tell the remote repository too about this commit:git push
.Unfinished pieces of work can be committed too, for backup. In that case, add
WIP
(work in progress) to the commit message, e.g.WIP on feature A
. In general, it is best not to push unfinished work to the remote repository, unless it is in a separate branch and you are the only one working on it.
6. Version management
Version numbers are practical, even for a small software project used only by
yourself. For larger projects, certainly when other users start using them, they
become indispensable. When assigning a version number to a project, we highly
recommend to follow the guidelines of
Semantic Versioning 2.0. Such a version number
consists of Major.minor.patch
. According to semantic versioning you should
increment the:
Major
version when you make incompatible API changes,minor
version when you add functionality in a backwards compatible manner, andpatch
version when you make backwards compatible bug fixes.
When Micc2 creates a project, it puts a __version__
string with value
'0.0.0'
in the top level Python module of the project. So, users can access a
Micc2 package’s version as package_name.__version__
. The version string is
also encoded in the pyproject.toml
file.
Note
Although the __version__
string approach did not make it as the Python standard
approach for encoding versions strings (see PEP 396), Micc2 will still support it
for some time because the accepted approach relies on the standard library package
importlib.metadata
, which is only available for Python versions 3.8 and
higher.
The micc2 version
command allows you to modify the version string consistently
in a project. The most common way of modifying a project’s version string is to ‘bump’
one of the version components, Major, minor, or patch. This implies incrementing the
component by 1, and setting all the lower components to 0. This is illustrated below.
Suppose we are the project director of package foo
:
micc2 info
Project foo located at /Users/etijskens/software/dev/workspace/et-micc2-tutorials-workspace-tmp/foo
package: foo
version: 0.0.0
contents:
foo top-level package (source in foo/__init__.py)
micc2 version
Project (foo) version (0.0.0)
micc2 version --patch
[INFO] (foo)> version (0.0.0) -> (0.0.1)
micc2 version --minor
[INFO] (foo)> version (0.0.1) -> (0.1.0)
micc2 version --patch
[INFO] (foo)> version (0.1.0) -> (0.1.1)
micc2 version --major
[INFO] (foo)> version (0.1.1) -> (1.0.0)
Without arguments the micc2 version
command just shows the current version.
Furthermore, the flags --patch
, --minor
, and --major
can be
abbreviated as -p
, -m
and -M
, respectively.
The micc2 version
command also has a --tag
flag that creates a git tag with
name v<version_string>
(see
https://git-scm.com/book/en/v2/Git-Basics-Tagging) and pushes the tag to the
remote repository.
7. Publishing your code
By publising your code other users can easily reuse your code. Although a public GitHub repository makes that possible, Python provides the Python Package Index (PyPI). Packages published on PyPI can be installed by anyone using pip.
7.1. Publishing to the Python Package Index
Poetry provides really easy interface to publishing your code to the Python Package
Index (PyPI). To install poetry see
https://python-poetry.org/docs/#installation. You must also create a PyPI
account here. Thento publish the
ET-dot
package, run this command in the project directory:this command in
the project directory, of:
> poetry publish --build
Creating virtualenv et-dot in /Users/etijskens/software/dev/workspace/Tutorials/ET-dot/.venv
Building ET-dot (0.0.1)
- Building sdist
- Built ET-dot-0.0.1.tar.gz
- Building wheel
- Built ET_dot-0.0.1-py3-none-any.whl
Publishing ET-dot (0.0.1) to PyPI
- Uploading ET-dot-0.0.1.tar.gz 100%
- Uploading ET_dot-0.0.1-py3-none-any.whl 100%
In order for your project to be publishable, it is necessary that the project name is
not already in use on PyPI. As there are 100s of projects on PyPI, it is wise to check
that. You can do this manually, but micc2 also provides a --publish
flag for the
micc2 create
command that verifies that the project name is still available on
PyPI. If the name is already taken, the project will not be created and micc2 will
suggest to choose another project name. See 1.1.1. What’s in a name for
recommendations of how to choose project names. If the name is not yet taken, it is wise
to publish the freshly created project right away (even without any useful
contents), to make sure that no one else can publish a project with the same name.
Note that a single version of a project can only be published once. If the
ET-dot
must be modified, e.g. to fix a bug, one must bump the version number
before it can be published again. Once a version is published it cannot be modified.
After the project is published, everyone can install the package in his current Python environment as:
> pip install et-foo
...
7.1.1. Publishing packages with binary extension modules
Packages with binary extension modules are published in exactly the same way. That
is, perhaps surprisingly, as a Python-only project. When you pip install
a
Micc2 project, the package directory will end up in the site-packages
directory of the Python environment in which you install. The source code
directories of the binary extensions modules are also installed with the package,
but without the binary extensions themselves. These must be compiled locally.
Micc2 has added some machinery to automatically build the binary extensions from
the source code, as explained in detail at the end of section 3.3. Building binary extension modules from Fortran.
Obviously, this ‘auto-build’, can only succeed if the necessary tools are
available. In case of failure because of missing tools, micc2 will tell you which
tools are missing.
7.2. Publishing your documentation on readthedocs.org
Publishing your documentation to Readthedocs relieves the users of your code from having to build documentation themselves. Making it happen is very easy. First, make sure the git repository of your code is pushed on Github. Second, create a Readthedocs account if you do not already have one. Then, go to your Readthedocs page, go to your projects and hit import project. After filling in the fields, the documentation will be rebuild and published automatically every time you push your code to the Github remote repository.
Note
Sphinx must be able to import your project in order to extract the documentation. If
your codes depend on Python modules other than the standard library, this will fail
and the documentation will not be built. You can add the necessary dependencies to
<your-project>/docs/requirements.txt
.
8. Debugging binary extensions
Debugging is the process of executing a program step by step, in order to discover where and why it goes wrong. It is an indispensable step in software development. Although tests may tell you what part of your code fails, but the origin of the failure is not always clear. As explained in the tutorials (see 1.2.4. Testing your code) unit tests are useful for two reasons:
they assure that your code does not get broken while you add features, or modify it, and
they constrain the part of the code that has an issue. If a test fails, the origin of the must be somewhere in the part of the code that is tested. By keeping the tested parts small, you will find the flaw sooner, and proceed faster.
For small projects inserting print statements in flawed code can be a good approach to discover the flaw, but it is cumbersome and in the case of binary extensions requires rebuilding the code often. Debugging is a more scalable approach.
Graphical Debuggers as provided in IDEes, e.g. PyCharm, Eclipse + pydev, Visual Studio, present a great user experience, but not all are capable of debugging mixed Python/C++/Fortran. See here for more information.
Pycharm: only Python, but great user experience.
eclipse: debugging binaries should be possible but not really mixed mode.
Visual Studio code: provides mixed language debugging Python/C++/Fortran.
Note
June 8, 2021: On MACOS, Visual Studio Code, as it uses lldb under the hood, also does not show the variables in a Fortran binary extension. It is unclear whether that is due to a quirk in f2py or lldb.
For HPC environments there is also:
These are also capable debugging OpenMP (multi-threaded) and MPI applications (multi-process).
For Linux environments there is also a lightweight approach possible using gdb and pdb. On MACOS gdb can be replaced by lldb, which has very similar features, but different commands. (At the time of writing gdb for MACOS was broken). Here are two links describing the approach:
https://www.boost.org/doc/libs/1_76_0/libs/python/doc/html/faq/how_do_i_debug_my_python_extensi.html
The first link describes a fully mixed Python C++ approach, and works for Fortran as well. Thesecond link, is semi-mixed. It expects you to enter the Python commands yourself, which may be tedious at times, but can be practical to explore the situation.
We illustrate both strategies using a project foo with a C++ binary extension
cxx
, and a Fortran binary extension fortran
. The code we are using is just the
example code created by micc2, which defines a function for adding to arrays.
> micc2 create foo --package
...
> micc2 add cxx --cpp
...
> micc2 add fortran --f90
...
> micc2 info
Project foo located at /Users/etijskens/software/dev/workspace/foo
package: foo
version: 0.0.0
structure: foo/__init__.py (Python package)
contents:
C++ module cpp_cxx/cxx.cpp
f90 module f90_fortran/fortran.f90
micc2 build --build-type Debug
...
Make sure that you pass the --build-type Debug
flag, so that the binary
extensions are built with debug information.
It is recommend to debug small scripts, rather than complete applications. This is, however, not always possible.
8.1. Mixed Python/C++ debugging with lldb and pdb
This section illustrates mixed language debugging of a Python script calling a
method from a C++ binary extension. Here we are using lldb
on a MACOS system. In the
next section we will do the same for a Fortran binary extension on Linux (Ubuntu),
using gdb
.
Note
For an overview of lldb
checkout https://lldb.llvm.org.
Note
For an overview of pdb
checkout
https://docs.python.org/3/library/pdb.html, and
Python Debugging With Pdb.
Suppose we are concerned about the C++ correctness of the add
function and that we
want to execute it step by step to see if it runs as expected. We first demonstrate the
approach of the first link above, on MACOS, using lldb instead of gdb. The commands
are different for gdb
and lldb
, but the strategy is exactly the same. First,
start lldb with the Python executable you want to use. As I am using pyenv to manage
differen python versions on my machine, the python
on the PATH is only a wrapper
for the the real Python executable, so I must specify the full path, because lldb
expects a true executable.
> lldb ~/.pyenv/versions/3.8.5/bin/python
(lldb) target create "/Users/etijskens/.pyenv/versions/3.8.5/bin/python"
Current executable set to '/Users/etijskens/.pyenv/versions/3.8.5/bin/python' (x86_64).
(lldb) target create "/Users/etijskens/.pyenv/versions/3.8.5/bin/python"
(lldb)
Next, you set a breakpoint in the c++ file, e.g. on the first line of the add
function. As the binary extension, which is in fact nothing else than a dynamic
library, has not been loaded yet, lldb
replies that there is no location for the
breakpoint, and that the breakpoint is ‘pending’, i.e. waiting to become active as
soon as the dynamic library is loaded.
(lldb) breakpoint set --file cxx.cpp -l 19
Breakpoint 1: no locations (pending).
WARNING: Unable to resolve breakpoint to any actual locations.
(lldb)
Next, start the Python test script for the C++ add function,
tests est_cpp_cxx.py
with pdb
:
(lldb) run -m pdb tests/test_cpp_cxx.py
Process 26917 launched: '/Users/etijskens/.pyenv/versions/3.8.5/bin/python' (x86_64)
> /Users/etijskens/software/dev/workspace/foo/tests/test_cpp_cxx.py(4)<module>()
-> """
(Pdb)
and set a pdb
breakpoint on the test method for the add
function (which is
called in the if __name__ == "__main__":
body:
(Pdb) b test_cpp_add
Breakpoint 1 at /Users/etijskens/software/dev/workspace/foo/tests/test_cpp_cxx.py:19
(Pdb)
This time the breakpoint is found right away, because the file that contains it,
tests/test_cpp_cxx.py
is already loaded.
Now we are ready to start the script with the r(un)
command, after which pbd
stops at the first line in the test_cpp_add method, the pdb
breakpoint:
(Pdb) r
1 location added to breakpoint 1
__main__ running <function test_cpp_add at 0x104890310> ...
> /Users/etijskens/software/dev/workspace/foo/tests/test_cpp_cxx.py(20)test_cpp_add()
-> x = np.array([0,1,2,3,4],dtype=float)
(Pdb)
Now, we can execute this line and inspect the variable x
with the p(rint)
command:
(Pdb) n
> /Users/etijskens/software/dev/workspace/foo/tests/test_cpp_cxx.py(21)test_cpp_add()
-> shape = x.shape
(Pdb) p x
array([0., 1., 2., 3., 4.])
(Pdb)
Continue stepping until you arrive at the call to cpp.add
, you can examine de
contents of y
and z
as well, just as every other variable which is in the scope:
(Pdb) n
> /Users/etijskens/software/dev/workspace/foo/tests/test_cpp_cxx.py(22)test_cpp_add()
-> y = np.ones (shape,dtype=float)
(Pdb) n
> /Users/etijskens/software/dev/workspace/foo/tests/test_cpp_cxx.py(23)test_cpp_add()
-> z = np.zeros(shape,dtype=float)
(Pdb) n
> /Users/etijskens/software/dev/workspace/foo/tests/test_cpp_cxx.py(24)test_cpp_add()
-> expected_z = x + y
(Pdb) n
> /Users/etijskens/software/dev/workspace/foo/tests/test_cpp_cxx.py(25)test_cpp_add()
-> result = cpp.add(x,y,z)
(Pdb) p y
array([1., 1., 1., 1., 1.])
(Pdb) p z
array([0., 0., 0., 0., 0.])
(Pdb)
Stepping once more will hit the breakpoint on linr 19 of file cxx.cpp
in lldb
:
(Pdb) n
Process 26917 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x0000000112324b58 cxx.cpython-38-darwin.so`add(x=array_t<double, 16> @ 0x00007ffeefbfc3a8, y=array_t<double, 16> @ 0x00007ffeefbfc3a0, z=array_t<double, 16> @ 0x00007ffeefbfc388) at cxx.cpp:19:19
16 , py::array_t<double> z
17 )
18 {
-> 19 auto bufx = x.request()
20 , bufy = y.request()
21 , bufz = z.request()
22 ;
Target 0: (python) stopped.
(lldb)
as in pdb you can execute step by step with the n(ext)
command. Continue stepping
until you arrive at line 38, where you can examine the contents of the x argument.
(lldb) n
Process 26917 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = step over
frame #0: 0x0000000112324d80 cxx.cpython-38-darwin.so`add(x=array_t<double, 16> @ 0x00007ffeefbfc3a8, y=array_t<double, 16> @ 0x00007ffeefbfc3a0, z=array_t<double, 16> @ 0x00007ffeefbfc388) at cxx.cpp:38:59
35 // because the Numpy arrays are mutable by default, py::array_t is mutable too.
36 // Below we declare the raw C++ arrays for x and y as const to make their intent clear.`
37 double const *ptrx = static_cast<double const *>(bufx.ptr);
-> 38 double const *ptry = static_cast<double const *>(bufy.ptr);
39 double *ptrz = static_cast<double *>(bufz.ptr);
40
41 for (size_t i = 0; i < bufx.shape[0]; i++)~
Target 0: (python) stopped.
(lldb) p ptrx[0]
(const double) $0 = 0
(lldb) p ptrx[1]
(const double) $1 = 1
(lldb)
You can continue to execute line by line, which will eventually drop you in the wrapper
code, which is hard to understand and not necessarily compiled with debugging
information. We step out of it with the finish
command, to end up back in pdb
:
(lldb) finish
> /Users/etijskens/software/dev/workspace/foo/tests/test_cpp_cxx.py(26)test_cpp_add()
-> assert (z == expected_z).all()
(Pdb)
8.2. Mixed Python/Fortran debugging with gdb and pdb on Linux
This time we will debug the tests/test_f90_fortran.py
script which calls the
Fortran binary extension.We are using gdb from an Ubuntu machine.
Note
For an overview of gdb
checkout
https://www.gnu.org/software/gdb/documentation/.
Note
For an overview of pdb
checkout
https://docs.python.org/3/library/pdb.html, and
Python Debugging With Pdb.
As above we start the true Python executable, but this time with gdb
. The
procedure is very similar. Only the gdb
commands differ a somewhat from thee
lldb
commands, and sometimes the output is different too.
osboxes@osboxes:~/workspace/foo$ gdb ~/.pyenv/versions/3.9.5/bin/python
GNU gdb (Ubuntu 10.1-2ubuntu2) 10.1.90.20210411-git
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /home/osboxes/.pyenv/versions/3.9.5/bin/python...
(gdb) b fortran.f90:32
No source file named fortran.f90.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (fortran.f90:32) pending.
Gdb
asks you to confirm if you set a breakpoint that cannot be found yet.
(gdb) run -m pdb tests/test_f90_fortran.py
Starting program: /home/osboxes/.pyenv/versions/3.9.5/bin/python -m pdb tests/test_f90_fortran.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
b> /home/osboxes/workspace/foo/tests/test_f90_fortran.py(3)<module>()
-> """Tests for f90 module `foo.fortran`."""
(Pdb) b test_f90_add
Breakpoint 1 at /home/osboxes/workspace/foo/tests/test_f90_fortran.py:14
(Pdb) r
[New Thread 0x7ffff3b9a640 (LWP 3824)]
[New Thread 0x7ffff3399640 (LWP 3825)]
[New Thread 0x7ffff0b98640 (LWP 3826)]
[New Thread 0x7fffee397640 (LWP 3827)]
[New Thread 0x7fffebb96640 (LWP 3828)]
[New Thread 0x7fffe9395640 (LWP 3829)]
[New Thread 0x7fffe6b94640 (LWP 3830)]
[New Thread 0x7fffe4393640 (LWP 3831)]
[New Thread 0x7fffe1b92640 (LWP 3832)]
__main__ running <function test_f90_add at 0x7ffff6aaa9d0> ...
> /home/osboxes/workspace/foo/tests/test_f90_fortran.py(15)test_f90_add()
-> x = np.array([0,1,2,3,4],dtype=float)
(Pdb) n
> /home/osboxes/workspace/foo/tests/test_f90_fortran.py(16)test_f90_add()
-> shape = x.shape
(Pdb)
> /home/osboxes/workspace/foo/tests/test_f90_fortran.py(17)test_f90_add()
-> y = np.ones (shape,dtype=float)
(Pdb)
> /home/osboxes/workspace/foo/tests/test_f90_fortran.py(18)test_f90_add()
-> z = np.zeros(shape,dtype=float)
(Pdb)
> /home/osboxes/workspace/foo/tests/test_f90_fortran.py(19)test_f90_add()
-> expected_z = x + y
(Pdb)
> /home/osboxes/workspace/foo/tests/test_f90_fortran.py(20)test_f90_add()
-> f90.add(x,y,z)
(Pdb)
Thread 1 "python" hit Breakpoint 1, add (x=..., y=..., z=..., n=5) at /home/osboxes/workspace/foo/foo/f90_fortran/fortran.f90:32
32 do i=1,n
(gdb) p x
$1 = (0, 1, 2, 3, 4)
(gdb) c
Continuing.
> /home/osboxes/workspace/foo/tests/test_f90_fortran.py(21)test_f90_add()
-> assert (z == expected_z).all()
(Pdb) c
-*# finished #*-
The program finished and will be restarted
> /home/osboxes/workspace/foo/tests/test_f90_fortran.py(3)<module>()
-> """Tests for f90 module `foo.fortran`."""
(Pdb)
Note
Fortran support in Lldb
seems to be limited. It is possible to step through the
code, but not the variables are invisible. It is unclear whether this is due to a quirk
in lldb or f2py on MACOS.
8.2.1. Visual Studio Code
Visual_studio_code is an IDE that misfree, open source, and available for Windows, Linux and MACOS. It supports graphical mixed language debugging for Python, C++ and Fortran. In addition it is possible to work remotely using ssh (as is eclipse). You can edit remote files with Visual Studio’s builtin editor, and have a remote terminal as well. The above debugging approaches can be applied in a remote terminal. So, it is possible to use it for development on the (VSC) clusters. Here are some useful links;