File size: 5,337 Bytes
7885a28 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 |
"""Joblib is a set of tools to provide **lightweight pipelining in
Python**. In particular:
1. transparent disk-caching of functions and lazy re-evaluation
(memoize pattern)
2. easy simple parallel computing
Joblib is optimized to be **fast** and **robust** on large
data in particular and has specific optimizations for `numpy` arrays. It is
**BSD-licensed**.
==================== ===============================================
**Documentation:** https://joblib.readthedocs.io
**Download:** https://pypi.python.org/pypi/joblib#downloads
**Source code:** https://github.com/joblib/joblib
**Report issues:** https://github.com/joblib/joblib/issues
==================== ===============================================
Vision
--------
The vision is to provide tools to easily achieve better performance and
reproducibility when working with long running jobs.
* **Avoid computing the same thing twice**: code is often rerun again and
again, for instance when prototyping computational-heavy jobs (as in
scientific development), but hand-crafted solutions to alleviate this
issue are error-prone and often lead to unreproducible results.
* **Persist to disk transparently**: efficiently persisting
arbitrary objects containing large data is hard. Using
joblib's caching mechanism avoids hand-written persistence and
implicitly links the file on disk to the execution context of
the original Python object. As a result, joblib's persistence is
good for resuming an application status or computational job, eg
after a crash.
Joblib addresses these problems while **leaving your code and your flow
control as unmodified as possible** (no framework, no new paradigms).
Main features
------------------
1) **Transparent and fast disk-caching of output value:** a memoize or
make-like functionality for Python functions that works well for
arbitrary Python objects, including very large numpy arrays. Separate
persistence and flow-execution logic from domain logic or algorithmic
code by writing the operations as a set of steps with well-defined
inputs and outputs: Python functions. Joblib can save their
computation to disk and rerun it only if necessary::
>>> from joblib import Memory
>>> location = 'your_cache_dir_goes_here'
>>> mem = Memory(location, verbose=1)
>>> import numpy as np
>>> a = np.vander(np.arange(3)).astype(float)
>>> square = mem.cache(np.square)
>>> b = square(a) # doctest: +ELLIPSIS
______________________________________________________________________...
[Memory] Calling ...square...
square(array([[0., 0., 1.],
[1., 1., 1.],
[4., 2., 1.]]))
_________________________________________________...square - ...s, 0.0min
>>> c = square(a)
>>> # The above call did not trigger an evaluation
2) **Embarrassingly parallel helper:** to make it easy to write readable
parallel code and debug it quickly::
>>> from joblib import Parallel, delayed
>>> from math import sqrt
>>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))
[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]
3) **Fast compressed Persistence**: a replacement for pickle to work
efficiently on Python objects containing large data (
*joblib.dump* & *joblib.load* ).
..
>>> import shutil ; shutil.rmtree(location)
"""
# PEP0440 compatible formatted version, see:
# https://www.python.org/dev/peps/pep-0440/
#
# Generic release markers:
# X.Y
# X.Y.Z # For bugfix releases
#
# Admissible pre-release markers:
# X.YaN # Alpha release
# X.YbN # Beta release
# X.YrcN # Release Candidate
# X.Y # Final release
#
# Dev branch marker is: 'X.Y.dev' or 'X.Y.devN' where N is an integer.
# 'X.Y.dev0' is the canonical version of 'X.Y.dev'
#
__version__ = "1.5.0"
import os
from ._cloudpickle_wrapper import wrap_non_picklable_objects
from ._parallel_backends import ParallelBackendBase
from ._store_backends import StoreBackendBase
from .compressor import register_compressor
from .hashing import hash
from .logger import Logger, PrintTime
from .memory import MemorizedResult, Memory, expires_after, register_store_backend
from .numpy_pickle import dump, load
from .parallel import (
Parallel,
cpu_count,
delayed,
effective_n_jobs,
parallel_backend,
parallel_config,
register_parallel_backend,
)
__all__ = [
# On-disk result caching
"Memory",
"MemorizedResult",
"expires_after",
# Parallel code execution
"Parallel",
"delayed",
"cpu_count",
"effective_n_jobs",
"wrap_non_picklable_objects",
# Context to change the backend globally
"parallel_config",
"parallel_backend",
# Helpers to define and register store/parallel backends
"ParallelBackendBase",
"StoreBackendBase",
"register_compressor",
"register_parallel_backend",
"register_store_backend",
# Helpers kept for backward compatibility
"PrintTime",
"Logger",
"hash",
"dump",
"load",
]
# Workaround issue discovered in intel-openmp 2019.5:
# https://github.com/ContinuumIO/anaconda-issues/issues/11294
os.environ.setdefault("KMP_INIT_AT_FORK", "FALSE")
|