|
"""Joblib is a set of tools to provide **lightweight pipelining in |
|
Python**. In particular: |
|
|
|
1. transparent disk-caching of functions and lazy re-evaluation |
|
(memoize pattern) |
|
|
|
2. easy simple parallel computing |
|
|
|
Joblib is optimized to be **fast** and **robust** on large |
|
data in particular and has specific optimizations for `numpy` arrays. It is |
|
**BSD-licensed**. |
|
|
|
|
|
==================== =============================================== |
|
**Documentation:** https://joblib.readthedocs.io |
|
|
|
**Download:** https://pypi.python.org/pypi/joblib#downloads |
|
|
|
**Source code:** https://github.com/joblib/joblib |
|
|
|
**Report issues:** https://github.com/joblib/joblib/issues |
|
==================== =============================================== |
|
|
|
|
|
Vision |
|
-------- |
|
|
|
The vision is to provide tools to easily achieve better performance and |
|
reproducibility when working with long running jobs. |
|
|
|
* **Avoid computing the same thing twice**: code is often rerun again and |
|
again, for instance when prototyping computational-heavy jobs (as in |
|
scientific development), but hand-crafted solutions to alleviate this |
|
issue are error-prone and often lead to unreproducible results. |
|
|
|
* **Persist to disk transparently**: efficiently persisting |
|
arbitrary objects containing large data is hard. Using |
|
joblib's caching mechanism avoids hand-written persistence and |
|
implicitly links the file on disk to the execution context of |
|
the original Python object. As a result, joblib's persistence is |
|
good for resuming an application status or computational job, eg |
|
after a crash. |
|
|
|
Joblib addresses these problems while **leaving your code and your flow |
|
control as unmodified as possible** (no framework, no new paradigms). |
|
|
|
Main features |
|
------------------ |
|
|
|
1) **Transparent and fast disk-caching of output value:** a memoize or |
|
make-like functionality for Python functions that works well for |
|
arbitrary Python objects, including very large numpy arrays. Separate |
|
persistence and flow-execution logic from domain logic or algorithmic |
|
code by writing the operations as a set of steps with well-defined |
|
inputs and outputs: Python functions. Joblib can save their |
|
computation to disk and rerun it only if necessary:: |
|
|
|
>>> from joblib import Memory |
|
>>> location = 'your_cache_dir_goes_here' |
|
>>> mem = Memory(location, verbose=1) |
|
>>> import numpy as np |
|
>>> a = np.vander(np.arange(3)).astype(float) |
|
>>> square = mem.cache(np.square) |
|
>>> b = square(a) # doctest: +ELLIPSIS |
|
______________________________________________________________________... |
|
[Memory] Calling ...square... |
|
square(array([[0., 0., 1.], |
|
[1., 1., 1.], |
|
[4., 2., 1.]])) |
|
_________________________________________________...square - ...s, 0.0min |
|
|
|
>>> c = square(a) |
|
>>> # The above call did not trigger an evaluation |
|
|
|
2) **Embarrassingly parallel helper:** to make it easy to write readable |
|
parallel code and debug it quickly:: |
|
|
|
>>> from joblib import Parallel, delayed |
|
>>> from math import sqrt |
|
>>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10)) |
|
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] |
|
|
|
|
|
3) **Fast compressed Persistence**: a replacement for pickle to work |
|
efficiently on Python objects containing large data ( |
|
*joblib.dump* & *joblib.load* ). |
|
|
|
.. |
|
>>> import shutil ; shutil.rmtree(location) |
|
|
|
""" |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
__version__ = "1.5.0" |
|
|
|
|
|
import os |
|
|
|
from ._cloudpickle_wrapper import wrap_non_picklable_objects |
|
from ._parallel_backends import ParallelBackendBase |
|
from ._store_backends import StoreBackendBase |
|
from .compressor import register_compressor |
|
from .hashing import hash |
|
from .logger import Logger, PrintTime |
|
from .memory import MemorizedResult, Memory, expires_after, register_store_backend |
|
from .numpy_pickle import dump, load |
|
from .parallel import ( |
|
Parallel, |
|
cpu_count, |
|
delayed, |
|
effective_n_jobs, |
|
parallel_backend, |
|
parallel_config, |
|
register_parallel_backend, |
|
) |
|
|
|
__all__ = [ |
|
|
|
"Memory", |
|
"MemorizedResult", |
|
"expires_after", |
|
|
|
"Parallel", |
|
"delayed", |
|
"cpu_count", |
|
"effective_n_jobs", |
|
"wrap_non_picklable_objects", |
|
|
|
"parallel_config", |
|
"parallel_backend", |
|
|
|
"ParallelBackendBase", |
|
"StoreBackendBase", |
|
"register_compressor", |
|
"register_parallel_backend", |
|
"register_store_backend", |
|
|
|
"PrintTime", |
|
"Logger", |
|
"hash", |
|
"dump", |
|
"load", |
|
] |
|
|
|
|
|
|
|
|
|
os.environ.setdefault("KMP_INIT_AT_FORK", "FALSE") |
|
|