Spaces:
Runtime error
Runtime error
<!--Copyright 2020 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
--> | |
# Testing | |
Let's take a look at how 🤗 Transformers models are tested and how you can write new tests and improve the existing ones. | |
There are 2 test suites in the repository: | |
1. `tests` -- tests for the general API | |
2. `examples` -- tests primarily for various applications that aren't part of the API | |
## How transformers are tested | |
1. Once a PR is submitted it gets tested with 9 CircleCi jobs. Every new commit to that PR gets retested. These jobs | |
are defined in this [config file](https://github.com/huggingface/transformers/tree/main/.circleci/config.yml), so that if needed you can reproduce the same | |
environment on your machine. | |
These CI jobs don't run `@slow` tests. | |
2. There are 3 jobs run by [github actions](https://github.com/huggingface/transformers/actions): | |
- [torch hub integration](https://github.com/huggingface/transformers/tree/main/.github/workflows/github-torch-hub.yml): checks whether torch hub | |
integration works. | |
- [self-hosted (push)](https://github.com/huggingface/transformers/tree/main/.github/workflows/self-push.yml): runs fast tests on GPU only on commits on | |
`main`. It only runs if a commit on `main` has updated the code in one of the following folders: `src`, | |
`tests`, `.github` (to prevent running on added model cards, notebooks, etc.) | |
- [self-hosted runner](https://github.com/huggingface/transformers/tree/main/.github/workflows/self-scheduled.yml): runs normal and slow tests on GPU in | |
`tests` and `examples`: | |
```bash | |
RUN_SLOW=1 pytest tests/ | |
RUN_SLOW=1 pytest examples/ | |
``` | |
The results can be observed [here](https://github.com/huggingface/transformers/actions). | |
## Running tests | |
### Choosing which tests to run | |
This document goes into many details of how tests can be run. If after reading everything, you need even more details | |
you will find them [here](https://docs.pytest.org/en/latest/usage.html). | |
Here are some most useful ways of running tests. | |
Run all: | |
```console | |
pytest | |
``` | |
or: | |
```bash | |
make test | |
``` | |
Note that the latter is defined as: | |
```bash | |
python -m pytest -n auto --dist=loadfile -s -v ./tests/ | |
``` | |
which tells pytest to: | |
- run as many test processes as they are CPU cores (which could be too many if you don't have a ton of RAM!) | |
- ensure that all tests from the same file will be run by the same test process | |
- do not capture output | |
- run in verbose mode | |
### Getting the list of all tests | |
All tests of the test suite: | |
```bash | |
pytest --collect-only -q | |
``` | |
All tests of a given test file: | |
```bash | |
pytest tests/test_optimization.py --collect-only -q | |
``` | |
### Run a specific test module | |
To run an individual test module: | |
```bash | |
pytest tests/test_logging.py | |
``` | |
### Run specific tests | |
Since unittest is used inside most of the tests, to run specific subtests you need to know the name of the unittest | |
class containing those tests. For example, it could be: | |
```bash | |
pytest tests/test_optimization.py::OptimizationTest::test_adam_w | |
``` | |
Here: | |
- `tests/test_optimization.py` - the file with tests | |
- `OptimizationTest` - the name of the class | |
- `test_adam_w` - the name of the specific test function | |
If the file contains multiple classes, you can choose to run only tests of a given class. For example: | |
```bash | |
pytest tests/test_optimization.py::OptimizationTest | |
``` | |
will run all the tests inside that class. | |
As mentioned earlier you can see what tests are contained inside the `OptimizationTest` class by running: | |
```bash | |
pytest tests/test_optimization.py::OptimizationTest --collect-only -q | |
``` | |
You can run tests by keyword expressions. | |
To run only tests whose name contains `adam`: | |
```bash | |
pytest -k adam tests/test_optimization.py | |
``` | |
Logical `and` and `or` can be used to indicate whether all keywords should match or either. `not` can be used to | |
negate. | |
To run all tests except those whose name contains `adam`: | |
```bash | |
pytest -k "not adam" tests/test_optimization.py | |
``` | |
And you can combine the two patterns in one: | |
```bash | |
pytest -k "ada and not adam" tests/test_optimization.py | |
``` | |
For example to run both `test_adafactor` and `test_adam_w` you can use: | |
```bash | |
pytest -k "test_adam_w or test_adam_w" tests/test_optimization.py | |
``` | |
Note that we use `or` here, since we want either of the keywords to match to include both. | |
If you want to include only tests that include both patterns, `and` is to be used: | |
```bash | |
pytest -k "test and ada" tests/test_optimization.py | |
``` | |
### Run `accelerate` tests | |
Sometimes you need to run `accelerate` tests on your models. For that you can just add `-m accelerate_tests` to your command, if let's say you want to run these tests on `OPT` run: | |
```bash | |
RUN_SLOW=1 pytest -m accelerate_tests tests/models/opt/test_modeling_opt.py | |
``` | |
### Run documentation tests | |
In order to test whether the documentation examples are correct, you should check that the `doctests` are passing. | |
As an example, let's use [`WhisperModel.forward`'s docstring](https://github.com/huggingface/transformers/blob/main/src/transformers/models/whisper/modeling_whisper.py#L1017-L1035): | |
```python | |
r""" | |
Returns: | |
Example: | |
```python | |
>>> import torch | |
>>> from transformers import WhisperModel, WhisperFeatureExtractor | |
>>> from datasets import load_dataset | |
>>> model = WhisperModel.from_pretrained("openai/whisper-base") | |
>>> feature_extractor = WhisperFeatureExtractor.from_pretrained("openai/whisper-base") | |
>>> ds = load_dataset("hf-internal-testing/librispeech_asr_dummy", "clean", split="validation") | |
>>> inputs = feature_extractor(ds[0]["audio"]["array"], return_tensors="pt") | |
>>> input_features = inputs.input_features | |
>>> decoder_input_ids = torch.tensor([[1, 1]]) * model.config.decoder_start_token_id | |
>>> last_hidden_state = model(input_features, decoder_input_ids=decoder_input_ids).last_hidden_state | |
>>> list(last_hidden_state.shape) | |
[1, 2, 512] | |
```""" | |
``` | |
3 steps are required to debug the docstring examples: | |
1. In order to properly run the test, **an extra line has to be added** at the end of the docstring. This can be automatically done on any file using: | |
```bash | |
python utils/prepare_for_doc_test.py <path_to_file_or_dir> | |
``` | |
2. Then, you can use the following line to automatically test every docstring example in the desired file: | |
```bash | |
pytest --doctest-modules <path_to_file_or_dir> | |
``` | |
3. Once you are done debugging, you need to remove the extra line added in step **1.** by running the following: | |
```bash | |
python utils/prepare_for_doc_test.py <path_to_file_or_dir> --remove_new_line | |
``` | |
### Run only modified tests | |
You can run the tests related to the unstaged files or the current branch (according to Git) by using [pytest-picked](https://github.com/anapaulagomes/pytest-picked). This is a great way of quickly testing your changes didn't break | |
anything, since it won't run the tests related to files you didn't touch. | |
```bash | |
pip install pytest-picked | |
``` | |
```bash | |
pytest --picked | |
``` | |
All tests will be run from files and folders which are modified, but not yet committed. | |
### Automatically rerun failed tests on source modification | |
[pytest-xdist](https://github.com/pytest-dev/pytest-xdist) provides a very useful feature of detecting all failed | |
tests, and then waiting for you to modify files and continuously re-rerun those failing tests until they pass while you | |
fix them. So that you don't need to re start pytest after you made the fix. This is repeated until all tests pass after | |
which again a full run is performed. | |
```bash | |
pip install pytest-xdist | |
``` | |
To enter the mode: `pytest -f` or `pytest --looponfail` | |
File changes are detected by looking at `looponfailroots` root directories and all of their contents (recursively). | |
If the default for this value does not work for you, you can change it in your project by setting a configuration | |
option in `setup.cfg`: | |
```ini | |
[tool:pytest] | |
looponfailroots = transformers tests | |
``` | |
or `pytest.ini`/``tox.ini`` files: | |
```ini | |
[pytest] | |
looponfailroots = transformers tests | |
``` | |
This would lead to only looking for file changes in the respective directories, specified relatively to the ini-file’s | |
directory. | |
[pytest-watch](https://github.com/joeyespo/pytest-watch) is an alternative implementation of this functionality. | |
### Skip a test module | |
If you want to run all test modules, except a few you can exclude them by giving an explicit list of tests to run. For | |
example, to run all except `test_modeling_*.py` tests: | |
```bash | |
pytest *ls -1 tests/*py | grep -v test_modeling* | |
``` | |
### Clearing state | |
CI builds and when isolation is important (against speed), cache should be cleared: | |
```bash | |
pytest --cache-clear tests | |
``` | |
### Running tests in parallel | |
As mentioned earlier `make test` runs tests in parallel via `pytest-xdist` plugin (`-n X` argument, e.g. `-n 2` | |
to run 2 parallel jobs). | |
`pytest-xdist`'s `--dist=` option allows one to control how the tests are grouped. `--dist=loadfile` puts the | |
tests located in one file onto the same process. | |
Since the order of executed tests is different and unpredictable, if running the test suite with `pytest-xdist` | |
produces failures (meaning we have some undetected coupled tests), use [pytest-replay](https://github.com/ESSS/pytest-replay) to replay the tests in the same order, which should help with then somehow | |
reducing that failing sequence to a minimum. | |
### Test order and repetition | |
It's good to repeat the tests several times, in sequence, randomly, or in sets, to detect any potential | |
inter-dependency and state-related bugs (tear down). And the straightforward multiple repetition is just good to detect | |
some problems that get uncovered by randomness of DL. | |
#### Repeat tests | |
- [pytest-flakefinder](https://github.com/dropbox/pytest-flakefinder): | |
```bash | |
pip install pytest-flakefinder | |
``` | |
And then run every test multiple times (50 by default): | |
```bash | |
pytest --flake-finder --flake-runs=5 tests/test_failing_test.py | |
``` | |
<Tip> | |
This plugin doesn't work with `-n` flag from `pytest-xdist`. | |
</Tip> | |
<Tip> | |
There is another plugin `pytest-repeat`, but it doesn't work with `unittest`. | |
</Tip> | |
#### Run tests in a random order | |
```bash | |
pip install pytest-random-order | |
``` | |
Important: the presence of `pytest-random-order` will automatically randomize tests, no configuration change or | |
command line options is required. | |
As explained earlier this allows detection of coupled tests - where one test's state affects the state of another. When | |
`pytest-random-order` is installed it will print the random seed it used for that session, e.g: | |
```bash | |
pytest tests | |
[...] | |
Using --random-order-bucket=module | |
Using --random-order-seed=573663 | |
``` | |
So that if the given particular sequence fails, you can reproduce it by adding that exact seed, e.g.: | |
```bash | |
pytest --random-order-seed=573663 | |
[...] | |
Using --random-order-bucket=module | |
Using --random-order-seed=573663 | |
``` | |
It will only reproduce the exact order if you use the exact same list of tests (or no list at all). Once you start to | |
manually narrowing down the list you can no longer rely on the seed, but have to list them manually in the exact order | |
they failed and tell pytest to not randomize them instead using `--random-order-bucket=none`, e.g.: | |
```bash | |
pytest --random-order-bucket=none tests/test_a.py tests/test_c.py tests/test_b.py | |
``` | |
To disable the shuffling for all tests: | |
```bash | |
pytest --random-order-bucket=none | |
``` | |
By default `--random-order-bucket=module` is implied, which will shuffle the files on the module levels. It can also | |
shuffle on `class`, `package`, `global` and `none` levels. For the complete details please see its | |
[documentation](https://github.com/jbasko/pytest-random-order). | |
Another randomization alternative is: [`pytest-randomly`](https://github.com/pytest-dev/pytest-randomly). This | |
module has a very similar functionality/interface, but it doesn't have the bucket modes available in | |
`pytest-random-order`. It has the same problem of imposing itself once installed. | |
### Look and feel variations | |
#### pytest-sugar | |
[pytest-sugar](https://github.com/Frozenball/pytest-sugar) is a plugin that improves the look-n-feel, adds a | |
progressbar, and show tests that fail and the assert instantly. It gets activated automatically upon installation. | |
```bash | |
pip install pytest-sugar | |
``` | |
To run tests without it, run: | |
```bash | |
pytest -p no:sugar | |
``` | |
or uninstall it. | |
#### Report each sub-test name and its progress | |
For a single or a group of tests via `pytest` (after `pip install pytest-pspec`): | |
```bash | |
pytest --pspec tests/test_optimization.py | |
``` | |
#### Instantly shows failed tests | |
[pytest-instafail](https://github.com/pytest-dev/pytest-instafail) shows failures and errors instantly instead of | |
waiting until the end of test session. | |
```bash | |
pip install pytest-instafail | |
``` | |
```bash | |
pytest --instafail | |
``` | |
### To GPU or not to GPU | |
On a GPU-enabled setup, to test in CPU-only mode add `CUDA_VISIBLE_DEVICES=""`: | |
```bash | |
CUDA_VISIBLE_DEVICES="" pytest tests/test_logging.py | |
``` | |
or if you have multiple gpus, you can specify which one is to be used by `pytest`. For example, to use only the | |
second gpu if you have gpus `0` and `1`, you can run: | |
```bash | |
CUDA_VISIBLE_DEVICES="1" pytest tests/test_logging.py | |
``` | |
This is handy when you want to run different tasks on different GPUs. | |
Some tests must be run on CPU-only, others on either CPU or GPU or TPU, yet others on multiple-GPUs. The following skip | |
decorators are used to set the requirements of tests CPU/GPU/TPU-wise: | |
- `require_torch` - this test will run only under torch | |
- `require_torch_gpu` - as `require_torch` plus requires at least 1 GPU | |
- `require_torch_multi_gpu` - as `require_torch` plus requires at least 2 GPUs | |
- `require_torch_non_multi_gpu` - as `require_torch` plus requires 0 or 1 GPUs | |
- `require_torch_up_to_2_gpus` - as `require_torch` plus requires 0 or 1 or 2 GPUs | |
- `require_torch_tpu` - as `require_torch` plus requires at least 1 TPU | |
Let's depict the GPU requirements in the following table: | |
| n gpus | decorator | | |
|--------+--------------------------------| | |
| `>= 0` | `@require_torch` | | |
| `>= 1` | `@require_torch_gpu` | | |
| `>= 2` | `@require_torch_multi_gpu` | | |
| `< 2` | `@require_torch_non_multi_gpu` | | |
| `< 3` | `@require_torch_up_to_2_gpus` | | |
For example, here is a test that must be run only when there are 2 or more GPUs available and pytorch is installed: | |
```python no-style | |
@require_torch_multi_gpu | |
def test_example_with_multi_gpu(): | |
``` | |
If a test requires `tensorflow` use the `require_tf` decorator. For example: | |
```python no-style | |
@require_tf | |
def test_tf_thing_with_tensorflow(): | |
``` | |
These decorators can be stacked. For example, if a test is slow and requires at least one GPU under pytorch, here is | |
how to set it up: | |
```python no-style | |
@require_torch_gpu | |
@slow | |
def test_example_slow_on_gpu(): | |
``` | |
Some decorators like `@parametrized` rewrite test names, therefore `@require_*` skip decorators have to be listed | |
last for them to work correctly. Here is an example of the correct usage: | |
```python no-style | |
@parameterized.expand(...) | |
@require_torch_multi_gpu | |
def test_integration_foo(): | |
``` | |
This order problem doesn't exist with `@pytest.mark.parametrize`, you can put it first or last and it will still | |
work. But it only works with non-unittests. | |
Inside tests: | |
- How many GPUs are available: | |
```python | |
from transformers.testing_utils import get_gpu_count | |
n_gpu = get_gpu_count() # works with torch and tf | |
``` | |
### Distributed training | |
`pytest` can't deal with distributed training directly. If this is attempted - the sub-processes don't do the right | |
thing and end up thinking they are `pytest` and start running the test suite in loops. It works, however, if one | |
spawns a normal process that then spawns off multiple workers and manages the IO pipes. | |
Here are some tests that use it: | |
- [test_trainer_distributed.py](https://github.com/huggingface/transformers/tree/main/tests/trainer/test_trainer_distributed.py) | |
- [test_deepspeed.py](https://github.com/huggingface/transformers/tree/main/tests/deepspeed/test_deepspeed.py) | |
To jump right into the execution point, search for the `execute_subprocess_async` call in those tests. | |
You will need at least 2 GPUs to see these tests in action: | |
```bash | |
CUDA_VISIBLE_DEVICES=0,1 RUN_SLOW=1 pytest -sv tests/test_trainer_distributed.py | |
``` | |
### Output capture | |
During test execution any output sent to `stdout` and `stderr` is captured. If a test or a setup method fails, its | |
according captured output will usually be shown along with the failure traceback. | |
To disable output capturing and to get the `stdout` and `stderr` normally, use `-s` or `--capture=no`: | |
```bash | |
pytest -s tests/test_logging.py | |
``` | |
To send test results to JUnit format output: | |
```bash | |
py.test tests --junitxml=result.xml | |
``` | |
### Color control | |
To have no color (e.g., yellow on white background is not readable): | |
```bash | |
pytest --color=no tests/test_logging.py | |
``` | |
### Sending test report to online pastebin service | |
Creating a URL for each test failure: | |
```bash | |
pytest --pastebin=failed tests/test_logging.py | |
``` | |
This will submit test run information to a remote Paste service and provide a URL for each failure. You may select | |
tests as usual or add for example -x if you only want to send one particular failure. | |
Creating a URL for a whole test session log: | |
```bash | |
pytest --pastebin=all tests/test_logging.py | |
``` | |
## Writing tests | |
🤗 transformers tests are based on `unittest`, but run by `pytest`, so most of the time features from both systems | |
can be used. | |
You can read [here](https://docs.pytest.org/en/stable/unittest.html) which features are supported, but the important | |
thing to remember is that most `pytest` fixtures don't work. Neither parametrization, but we use the module | |
`parameterized` that works in a similar way. | |
### Parametrization | |
Often, there is a need to run the same test multiple times, but with different arguments. It could be done from within | |
the test, but then there is no way of running that test for just one set of arguments. | |
```python | |
# test_this1.py | |
import unittest | |
from parameterized import parameterized | |
class TestMathUnitTest(unittest.TestCase): | |
@parameterized.expand( | |
[ | |
("negative", -1.5, -2.0), | |
("integer", 1, 1.0), | |
("large fraction", 1.6, 1), | |
] | |
) | |
def test_floor(self, name, input, expected): | |
assert_equal(math.floor(input), expected) | |
``` | |
Now, by default this test will be run 3 times, each time with the last 3 arguments of `test_floor` being assigned the | |
corresponding arguments in the parameter list. | |
and you could run just the `negative` and `integer` sets of params with: | |
```bash | |
pytest -k "negative and integer" tests/test_mytest.py | |
``` | |
or all but `negative` sub-tests, with: | |
```bash | |
pytest -k "not negative" tests/test_mytest.py | |
``` | |
Besides using the `-k` filter that was just mentioned, you can find out the exact name of each sub-test and run any | |
or all of them using their exact names. | |
```bash | |
pytest test_this1.py --collect-only -q | |
``` | |
and it will list: | |
```bash | |
test_this1.py::TestMathUnitTest::test_floor_0_negative | |
test_this1.py::TestMathUnitTest::test_floor_1_integer | |
test_this1.py::TestMathUnitTest::test_floor_2_large_fraction | |
``` | |
So now you can run just 2 specific sub-tests: | |
```bash | |
pytest test_this1.py::TestMathUnitTest::test_floor_0_negative test_this1.py::TestMathUnitTest::test_floor_1_integer | |
``` | |
The module [parameterized](https://pypi.org/project/parameterized/) which is already in the developer dependencies | |
of `transformers` works for both: `unittests` and `pytest` tests. | |
If, however, the test is not a `unittest`, you may use `pytest.mark.parametrize` (or you may see it being used in | |
some existing tests, mostly under `examples`). | |
Here is the same example, this time using `pytest`'s `parametrize` marker: | |
```python | |
# test_this2.py | |
import pytest | |
@pytest.mark.parametrize( | |
"name, input, expected", | |
[ | |
("negative", -1.5, -2.0), | |
("integer", 1, 1.0), | |
("large fraction", 1.6, 1), | |
], | |
) | |
def test_floor(name, input, expected): | |
assert_equal(math.floor(input), expected) | |
``` | |
Same as with `parameterized`, with `pytest.mark.parametrize` you can have a fine control over which sub-tests are | |
run, if the `-k` filter doesn't do the job. Except, this parametrization function creates a slightly different set of | |
names for the sub-tests. Here is what they look like: | |
```bash | |
pytest test_this2.py --collect-only -q | |
``` | |
and it will list: | |
```bash | |
test_this2.py::test_floor[integer-1-1.0] | |
test_this2.py::test_floor[negative--1.5--2.0] | |
test_this2.py::test_floor[large fraction-1.6-1] | |
``` | |
So now you can run just the specific test: | |
```bash | |
pytest test_this2.py::test_floor[negative--1.5--2.0] test_this2.py::test_floor[integer-1-1.0] | |
``` | |
as in the previous example. | |
### Files and directories | |
In tests often we need to know where things are relative to the current test file, and it's not trivial since the test | |
could be invoked from more than one directory or could reside in sub-directories with different depths. A helper class | |
`transformers.test_utils.TestCasePlus` solves this problem by sorting out all the basic paths and provides easy | |
accessors to them: | |
- `pathlib` objects (all fully resolved): | |
- `test_file_path` - the current test file path, i.e. `__file__` | |
- `test_file_dir` - the directory containing the current test file | |
- `tests_dir` - the directory of the `tests` test suite | |
- `examples_dir` - the directory of the `examples` test suite | |
- `repo_root_dir` - the directory of the repository | |
- `src_dir` - the directory of `src` (i.e. where the `transformers` sub-dir resides) | |
- stringified paths---same as above but these return paths as strings, rather than `pathlib` objects: | |
- `test_file_path_str` | |
- `test_file_dir_str` | |
- `tests_dir_str` | |
- `examples_dir_str` | |
- `repo_root_dir_str` | |
- `src_dir_str` | |
To start using those all you need is to make sure that the test resides in a subclass of | |
`transformers.test_utils.TestCasePlus`. For example: | |
```python | |
from transformers.testing_utils import TestCasePlus | |
class PathExampleTest(TestCasePlus): | |
def test_something_involving_local_locations(self): | |
data_dir = self.tests_dir / "fixtures/tests_samples/wmt_en_ro" | |
``` | |
If you don't need to manipulate paths via `pathlib` or you just need a path as a string, you can always invoked | |
`str()` on the `pathlib` object or use the accessors ending with `_str`. For example: | |
```python | |
from transformers.testing_utils import TestCasePlus | |
class PathExampleTest(TestCasePlus): | |
def test_something_involving_stringified_locations(self): | |
examples_dir = self.examples_dir_str | |
``` | |
### Temporary files and directories | |
Using unique temporary files and directories are essential for parallel test running, so that the tests won't overwrite | |
each other's data. Also we want to get the temporary files and directories removed at the end of each test that created | |
them. Therefore, using packages like `tempfile`, which address these needs is essential. | |
However, when debugging tests, you need to be able to see what goes into the temporary file or directory and you want | |
to know it's exact path and not having it randomized on every test re-run. | |
A helper class `transformers.test_utils.TestCasePlus` is best used for such purposes. It's a sub-class of | |
`unittest.TestCase`, so we can easily inherit from it in the test modules. | |
Here is an example of its usage: | |
```python | |
from transformers.testing_utils import TestCasePlus | |
class ExamplesTests(TestCasePlus): | |
def test_whatever(self): | |
tmp_dir = self.get_auto_remove_tmp_dir() | |
``` | |
This code creates a unique temporary directory, and sets `tmp_dir` to its location. | |
- Create a unique temporary dir: | |
```python | |
def test_whatever(self): | |
tmp_dir = self.get_auto_remove_tmp_dir() | |
``` | |
`tmp_dir` will contain the path to the created temporary dir. It will be automatically removed at the end of the | |
test. | |
- Create a temporary dir of my choice, ensure it's empty before the test starts and don't empty it after the test. | |
```python | |
def test_whatever(self): | |
tmp_dir = self.get_auto_remove_tmp_dir("./xxx") | |
``` | |
This is useful for debug when you want to monitor a specific directory and want to make sure the previous tests didn't | |
leave any data in there. | |
- You can override the default behavior by directly overriding the `before` and `after` args, leading to one of the | |
following behaviors: | |
- `before=True`: the temporary dir will always be cleared at the beginning of the test. | |
- `before=False`: if the temporary dir already existed, any existing files will remain there. | |
- `after=True`: the temporary dir will always be deleted at the end of the test. | |
- `after=False`: the temporary dir will always be left intact at the end of the test. | |
<Tip> | |
In order to run the equivalent of `rm -r` safely, only subdirs of the project repository checkout are allowed if | |
an explicit `tmp_dir` is used, so that by mistake no `/tmp` or similar important part of the filesystem will | |
get nuked. i.e. please always pass paths that start with `./`. | |
</Tip> | |
<Tip> | |
Each test can register multiple temporary directories and they all will get auto-removed, unless requested | |
otherwise. | |
</Tip> | |
### Temporary sys.path override | |
If you need to temporary override `sys.path` to import from another test for example, you can use the | |
`ExtendSysPath` context manager. Example: | |
```python | |
import os | |
from transformers.testing_utils import ExtendSysPath | |
bindir = os.path.abspath(os.path.dirname(__file__)) | |
with ExtendSysPath(f"{bindir}/.."): | |
from test_trainer import TrainerIntegrationCommon # noqa | |
``` | |
### Skipping tests | |
This is useful when a bug is found and a new test is written, yet the bug is not fixed yet. In order to be able to | |
commit it to the main repository we need make sure it's skipped during `make test`. | |
Methods: | |
- A **skip** means that you expect your test to pass only if some conditions are met, otherwise pytest should skip | |
running the test altogether. Common examples are skipping windows-only tests on non-windows platforms, or skipping | |
tests that depend on an external resource which is not available at the moment (for example a database). | |
- A **xfail** means that you expect a test to fail for some reason. A common example is a test for a feature not yet | |
implemented, or a bug not yet fixed. When a test passes despite being expected to fail (marked with | |
pytest.mark.xfail), it’s an xpass and will be reported in the test summary. | |
One of the important differences between the two is that `skip` doesn't run the test, and `xfail` does. So if the | |
code that's buggy causes some bad state that will affect other tests, do not use `xfail`. | |
#### Implementation | |
- Here is how to skip whole test unconditionally: | |
```python no-style | |
@unittest.skip("this bug needs to be fixed") | |
def test_feature_x(): | |
``` | |
or via pytest: | |
```python no-style | |
@pytest.mark.skip(reason="this bug needs to be fixed") | |
``` | |
or the `xfail` way: | |
```python no-style | |
@pytest.mark.xfail | |
def test_feature_x(): | |
``` | |
- Here is how to skip a test based on some internal check inside the test: | |
```python | |
def test_feature_x(): | |
if not has_something(): | |
pytest.skip("unsupported configuration") | |
``` | |
or the whole module: | |
```python | |
import pytest | |
if not pytest.config.getoption("--custom-flag"): | |
pytest.skip("--custom-flag is missing, skipping tests", allow_module_level=True) | |
``` | |
or the `xfail` way: | |
```python | |
def test_feature_x(): | |
pytest.xfail("expected to fail until bug XYZ is fixed") | |
``` | |
- Here is how to skip all tests in a module if some import is missing: | |
```python | |
docutils = pytest.importorskip("docutils", minversion="0.3") | |
``` | |
- Skip a test based on a condition: | |
```python no-style | |
@pytest.mark.skipif(sys.version_info < (3,6), reason="requires python3.6 or higher") | |
def test_feature_x(): | |
``` | |
or: | |
```python no-style | |
@unittest.skipIf(torch_device == "cpu", "Can't do half precision") | |
def test_feature_x(): | |
``` | |
or skip the whole module: | |
```python no-style | |
@pytest.mark.skipif(sys.platform == 'win32', reason="does not run on windows") | |
class TestClass(): | |
def test_feature_x(self): | |
``` | |
More details, example and ways are [here](https://docs.pytest.org/en/latest/skipping.html). | |
### Slow tests | |
The library of tests is ever-growing, and some of the tests take minutes to run, therefore we can't afford waiting for | |
an hour for the test suite to complete on CI. Therefore, with some exceptions for essential tests, slow tests should be | |
marked as in the example below: | |
```python no-style | |
from transformers.testing_utils import slow | |
@slow | |
def test_integration_foo(): | |
``` | |
Once a test is marked as `@slow`, to run such tests set `RUN_SLOW=1` env var, e.g.: | |
```bash | |
RUN_SLOW=1 pytest tests | |
``` | |
Some decorators like `@parameterized` rewrite test names, therefore `@slow` and the rest of the skip decorators | |
`@require_*` have to be listed last for them to work correctly. Here is an example of the correct usage: | |
```python no-style | |
@parameteriz ed.expand(...) | |
@slow | |
def test_integration_foo(): | |
``` | |
As explained at the beginning of this document, slow tests get to run on a scheduled basis, rather than in PRs CI | |
checks. So it's possible that some problems will be missed during a PR submission and get merged. Such problems will | |
get caught during the next scheduled CI job. But it also means that it's important to run the slow tests on your | |
machine before submitting the PR. | |
Here is a rough decision making mechanism for choosing which tests should be marked as slow: | |
If the test is focused on one of the library's internal components (e.g., modeling files, tokenization files, | |
pipelines), then we should run that test in the non-slow test suite. If it's focused on an other aspect of the library, | |
such as the documentation or the examples, then we should run these tests in the slow test suite. And then, to refine | |
this approach we should have exceptions: | |
- All tests that need to download a heavy set of weights or a dataset that is larger than ~50MB (e.g., model or | |
tokenizer integration tests, pipeline integration tests) should be set to slow. If you're adding a new model, you | |
should create and upload to the hub a tiny version of it (with random weights) for integration tests. This is | |
discussed in the following paragraphs. | |
- All tests that need to do a training not specifically optimized to be fast should be set to slow. | |
- We can introduce exceptions if some of these should-be-non-slow tests are excruciatingly slow, and set them to | |
`@slow`. Auto-modeling tests, which save and load large files to disk, are a good example of tests that are marked | |
as `@slow`. | |
- If a test completes under 1 second on CI (including downloads if any) then it should be a normal test regardless. | |
Collectively, all the non-slow tests need to cover entirely the different internals, while remaining fast. For example, | |
a significant coverage can be achieved by testing with specially created tiny models with random weights. Such models | |
have the very minimal number of layers (e.g., 2), vocab size (e.g., 1000), etc. Then the `@slow` tests can use large | |
slow models to do qualitative testing. To see the use of these simply look for *tiny* models with: | |
```bash | |
grep tiny tests examples | |
``` | |
Here is a an example of a [script](https://github.com/huggingface/transformers/tree/main/scripts/fsmt/fsmt-make-tiny-model.py) that created the tiny model | |
[stas/tiny-wmt19-en-de](https://huggingface.co/stas/tiny-wmt19-en-de). You can easily adjust it to your specific | |
model's architecture. | |
It's easy to measure the run-time incorrectly if for example there is an overheard of downloading a huge model, but if | |
you test it locally the downloaded files would be cached and thus the download time not measured. Hence check the | |
execution speed report in CI logs instead (the output of `pytest --durations=0 tests`). | |
That report is also useful to find slow outliers that aren't marked as such, or which need to be re-written to be fast. | |
If you notice that the test suite starts getting slow on CI, the top listing of this report will show the slowest | |
tests. | |
### Testing the stdout/stderr output | |
In order to test functions that write to `stdout` and/or `stderr`, the test can access those streams using the | |
`pytest`'s [capsys system](https://docs.pytest.org/en/latest/capture.html). Here is how this is accomplished: | |
```python | |
import sys | |
def print_to_stdout(s): | |
print(s) | |
def print_to_stderr(s): | |
sys.stderr.write(s) | |
def test_result_and_stdout(capsys): | |
msg = "Hello" | |
print_to_stdout(msg) | |
print_to_stderr(msg) | |
out, err = capsys.readouterr() # consume the captured output streams | |
# optional: if you want to replay the consumed streams: | |
sys.stdout.write(out) | |
sys.stderr.write(err) | |
# test: | |
assert msg in out | |
assert msg in err | |
``` | |
And, of course, most of the time, `stderr` will come as a part of an exception, so try/except has to be used in such | |
a case: | |
```python | |
def raise_exception(msg): | |
raise ValueError(msg) | |
def test_something_exception(): | |
msg = "Not a good value" | |
error = "" | |
try: | |
raise_exception(msg) | |
except Exception as e: | |
error = str(e) | |
assert msg in error, f"{msg} is in the exception:\n{error}" | |
``` | |
Another approach to capturing stdout is via `contextlib.redirect_stdout`: | |
```python | |
from io import StringIO | |
from contextlib import redirect_stdout | |
def print_to_stdout(s): | |
print(s) | |
def test_result_and_stdout(): | |
msg = "Hello" | |
buffer = StringIO() | |
with redirect_stdout(buffer): | |
print_to_stdout(msg) | |
out = buffer.getvalue() | |
# optional: if you want to replay the consumed streams: | |
sys.stdout.write(out) | |
# test: | |
assert msg in out | |
``` | |
An important potential issue with capturing stdout is that it may contain `\r` characters that in normal `print` | |
reset everything that has been printed so far. There is no problem with `pytest`, but with `pytest -s` these | |
characters get included in the buffer, so to be able to have the test run with and without `-s`, you have to make an | |
extra cleanup to the captured output, using `re.sub(r'~.*\r', '', buf, 0, re.M)`. | |
But, then we have a helper context manager wrapper to automatically take care of it all, regardless of whether it has | |
some `\r`'s in it or not, so it's a simple: | |
```python | |
from transformers.testing_utils import CaptureStdout | |
with CaptureStdout() as cs: | |
function_that_writes_to_stdout() | |
print(cs.out) | |
``` | |
Here is a full test example: | |
```python | |
from transformers.testing_utils import CaptureStdout | |
msg = "Secret message\r" | |
final = "Hello World" | |
with CaptureStdout() as cs: | |
print(msg + final) | |
assert cs.out == final + "\n", f"captured: {cs.out}, expecting {final}" | |
``` | |
If you'd like to capture `stderr` use the `CaptureStderr` class instead: | |
```python | |
from transformers.testing_utils import CaptureStderr | |
with CaptureStderr() as cs: | |
function_that_writes_to_stderr() | |
print(cs.err) | |
``` | |
If you need to capture both streams at once, use the parent `CaptureStd` class: | |
```python | |
from transformers.testing_utils import CaptureStd | |
with CaptureStd() as cs: | |
function_that_writes_to_stdout_and_stderr() | |
print(cs.err, cs.out) | |
``` | |
Also, to aid debugging test issues, by default these context managers automatically replay the captured streams on exit | |
from the context. | |
### Capturing logger stream | |
If you need to validate the output of a logger, you can use `CaptureLogger`: | |
```python | |
from transformers import logging | |
from transformers.testing_utils import CaptureLogger | |
msg = "Testing 1, 2, 3" | |
logging.set_verbosity_info() | |
logger = logging.get_logger("transformers.models.bart.tokenization_bart") | |
with CaptureLogger(logger) as cl: | |
logger.info(msg) | |
assert cl.out, msg + "\n" | |
``` | |
### Testing with environment variables | |
If you want to test the impact of environment variables for a specific test you can use a helper decorator | |
`transformers.testing_utils.mockenv` | |
```python | |
from transformers.testing_utils import mockenv | |
class HfArgumentParserTest(unittest.TestCase): | |
@mockenv(TRANSFORMERS_VERBOSITY="error") | |
def test_env_override(self): | |
env_level_str = os.getenv("TRANSFORMERS_VERBOSITY", None) | |
``` | |
At times an external program needs to be called, which requires setting `PYTHONPATH` in `os.environ` to include | |
multiple local paths. A helper class `transformers.test_utils.TestCasePlus` comes to help: | |
```python | |
from transformers.testing_utils import TestCasePlus | |
class EnvExampleTest(TestCasePlus): | |
def test_external_prog(self): | |
env = self.get_env() | |
# now call the external program, passing `env` to it | |
``` | |
Depending on whether the test file was under the `tests` test suite or `examples` it'll correctly set up | |
`env[PYTHONPATH]` to include one of these two directories, and also the `src` directory to ensure the testing is | |
done against the current repo, and finally with whatever `env[PYTHONPATH]` was already set to before the test was | |
called if anything. | |
This helper method creates a copy of the `os.environ` object, so the original remains intact. | |
### Getting reproducible results | |
In some situations you may want to remove randomness for your tests. To get identical reproducible results set, you | |
will need to fix the seed: | |
```python | |
seed = 42 | |
# python RNG | |
import random | |
random.seed(seed) | |
# pytorch RNGs | |
import torch | |
torch.manual_seed(seed) | |
torch.backends.cudnn.deterministic = True | |
if torch.cuda.is_available(): | |
torch.cuda.manual_seed_all(seed) | |
# numpy RNG | |
import numpy as np | |
np.random.seed(seed) | |
# tf RNG | |
tf.random.set_seed(seed) | |
``` | |
### Debugging tests | |
To start a debugger at the point of the warning, do this: | |
```bash | |
pytest tests/test_logging.py -W error::UserWarning --pdb | |
``` | |
## Working with github actions workflows | |
To trigger a self-push workflow CI job, you must: | |
1. Create a new branch on `transformers` origin (not a fork!). | |
2. The branch name has to start with either `ci_` or `ci-` (`main` triggers it too, but we can't do PRs on | |
`main`). It also gets triggered only for specific paths - you can find the up-to-date definition in case it | |
changed since this document has been written [here](https://github.com/huggingface/transformers/blob/main/.github/workflows/self-push.yml) under *push:* | |
3. Create a PR from this branch. | |
4. Then you can see the job appear [here](https://github.com/huggingface/transformers/actions/workflows/self-push.yml). It may not run right away if there | |
is a backlog. | |
## Testing Experimental CI Features | |
Testing CI features can be potentially problematic as it can interfere with the normal CI functioning. Therefore if a | |
new CI feature is to be added, it should be done as following. | |
1. Create a new dedicated job that tests what needs to be tested | |
2. The new job must always succeed so that it gives us a green ✓ (details below). | |
3. Let it run for some days to see that a variety of different PR types get to run on it (user fork branches, | |
non-forked branches, branches originating from github.com UI direct file edit, various forced pushes, etc. - there | |
are so many) while monitoring the experimental job's logs (not the overall job green as it's purposefully always | |
green) | |
4. When it's clear that everything is solid, then merge the new changes into existing jobs. | |
That way experiments on CI functionality itself won't interfere with the normal workflow. | |
Now how can we make the job always succeed while the new CI feature is being developed? | |
Some CIs, like TravisCI support ignore-step-failure and will report the overall job as successful, but CircleCI and | |
Github Actions as of this writing don't support that. | |
So the following workaround can be used: | |
1. `set +euo pipefail` at the beginning of the run command to suppress most potential failures in the bash script. | |
2. the last command must be a success: `echo "done"` or just `true` will do | |
Here is an example: | |
```yaml | |
- run: | |
name: run CI experiment | |
command: | | |
set +euo pipefail | |
echo "setting run-all-despite-any-errors-mode" | |
this_command_will_fail | |
echo "but bash continues to run" | |
# emulate another failure | |
false | |
# but the last command must be a success | |
echo "during experiment do not remove: reporting success to CI, even if there were failures" | |
``` | |
For simple commands you could also do: | |
```bash | |
cmd_that_may_fail || true | |
``` | |
Of course, once satisfied with the results, integrate the experimental step or job with the rest of the normal jobs, | |
while removing `set +euo pipefail` or any other things you may have added to ensure that the experimental job doesn't | |
interfere with the normal CI functioning. | |
This whole process would have been much easier if we only could set something like `allow-failure` for the | |
experimental step, and let it fail without impacting the overall status of PRs. But as mentioned earlier CircleCI and | |
Github Actions don't support it at the moment. | |
You can vote for this feature and see where it is at these CI-specific threads: | |
- [Github Actions:](https://github.com/actions/toolkit/issues/399) | |
- [CircleCI:](https://ideas.circleci.com/ideas/CCI-I-344) | |