File size: 5,337 Bytes
7885a28
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
"""Joblib is a set of tools to provide **lightweight pipelining in
Python**. In particular:

1. transparent disk-caching of functions and lazy re-evaluation
   (memoize pattern)

2. easy simple parallel computing

Joblib is optimized to be **fast** and **robust** on large
data in particular and has specific optimizations for `numpy` arrays. It is
**BSD-licensed**.


    ==================== ===============================================
    **Documentation:**       https://joblib.readthedocs.io

    **Download:**            https://pypi.python.org/pypi/joblib#downloads

    **Source code:**         https://github.com/joblib/joblib

    **Report issues:**       https://github.com/joblib/joblib/issues
    ==================== ===============================================


Vision
--------

The vision is to provide tools to easily achieve better performance and
reproducibility when working with long running jobs.

 *  **Avoid computing the same thing twice**: code is often rerun again and
    again, for instance when prototyping computational-heavy jobs (as in
    scientific development), but hand-crafted solutions to alleviate this
    issue are error-prone and often lead to unreproducible results.

 *  **Persist to disk transparently**: efficiently persisting
    arbitrary objects containing large data is hard. Using
    joblib's caching mechanism avoids hand-written persistence and
    implicitly links the file on disk to the execution context of
    the original Python object. As a result, joblib's persistence is
    good for resuming an application status or computational job, eg
    after a crash.

Joblib addresses these problems while **leaving your code and your flow
control as unmodified as possible** (no framework, no new paradigms).

Main features
------------------

1) **Transparent and fast disk-caching of output value:** a memoize or
   make-like functionality for Python functions that works well for
   arbitrary Python objects, including very large numpy arrays. Separate
   persistence and flow-execution logic from domain logic or algorithmic
   code by writing the operations as a set of steps with well-defined
   inputs and  outputs: Python functions. Joblib can save their
   computation to disk and rerun it only if necessary::

      >>> from joblib import Memory
      >>> location = 'your_cache_dir_goes_here'
      >>> mem = Memory(location, verbose=1)
      >>> import numpy as np
      >>> a = np.vander(np.arange(3)).astype(float)
      >>> square = mem.cache(np.square)
      >>> b = square(a)                                   # doctest: +ELLIPSIS
      ______________________________________________________________________...
      [Memory] Calling ...square...
      square(array([[0., 0., 1.],
             [1., 1., 1.],
             [4., 2., 1.]]))
      _________________________________________________...square - ...s, 0.0min

      >>> c = square(a)
      >>> # The above call did not trigger an evaluation

2) **Embarrassingly parallel helper:** to make it easy to write readable
   parallel code and debug it quickly::

      >>> from joblib import Parallel, delayed
      >>> from math import sqrt
      >>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))
      [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]


3) **Fast compressed Persistence**: a replacement for pickle to work
   efficiently on Python objects containing large data (
   *joblib.dump* & *joblib.load* ).

..
    >>> import shutil ; shutil.rmtree(location)

"""

# PEP0440 compatible formatted version, see:
# https://www.python.org/dev/peps/pep-0440/
#
# Generic release markers:
# X.Y
# X.Y.Z # For bugfix releases
#
# Admissible pre-release markers:
# X.YaN # Alpha release
# X.YbN # Beta release
# X.YrcN # Release Candidate
# X.Y # Final release
#
# Dev branch marker is: 'X.Y.dev' or 'X.Y.devN' where N is an integer.
# 'X.Y.dev0' is the canonical version of 'X.Y.dev'
#
__version__ = "1.5.0"


import os

from ._cloudpickle_wrapper import wrap_non_picklable_objects
from ._parallel_backends import ParallelBackendBase
from ._store_backends import StoreBackendBase
from .compressor import register_compressor
from .hashing import hash
from .logger import Logger, PrintTime
from .memory import MemorizedResult, Memory, expires_after, register_store_backend
from .numpy_pickle import dump, load
from .parallel import (
    Parallel,
    cpu_count,
    delayed,
    effective_n_jobs,
    parallel_backend,
    parallel_config,
    register_parallel_backend,
)

__all__ = [
    # On-disk result caching
    "Memory",
    "MemorizedResult",
    "expires_after",
    # Parallel code execution
    "Parallel",
    "delayed",
    "cpu_count",
    "effective_n_jobs",
    "wrap_non_picklable_objects",
    # Context to change the backend globally
    "parallel_config",
    "parallel_backend",
    # Helpers to define and register store/parallel backends
    "ParallelBackendBase",
    "StoreBackendBase",
    "register_compressor",
    "register_parallel_backend",
    "register_store_backend",
    # Helpers kept for backward compatibility
    "PrintTime",
    "Logger",
    "hash",
    "dump",
    "load",
]


# Workaround issue discovered in intel-openmp 2019.5:
# https://github.com/ContinuumIO/anaconda-issues/issues/11294
os.environ.setdefault("KMP_INIT_AT_FORK", "FALSE")