File size: 5,337 Bytes

7885a28

"""Joblib is a set of tools to provide **lightweight pipelining in
Python**. In particular:

1. transparent disk-caching of functions and lazy re-evaluation
   (memoize pattern)

2. easy simple parallel computing

Joblib is optimized to be **fast** and **robust** on large
data in particular and has specific optimizations for `numpy` arrays. It is
**BSD-licensed**.


    ==================== ===============================================
    **Documentation:**       https://joblib.readthedocs.io

    **Download:**            https://pypi.python.org/pypi/joblib#downloads

    **Source code:**         https://github.com/joblib/joblib

    **Report issues:**       https://github.com/joblib/joblib/issues
    ==================== ===============================================


Vision
--------

The vision is to provide tools to easily achieve better performance and
reproducibility when working with long running jobs.

 *  **Avoid computing the same thing twice**: code is often rerun again and
    again, for instance when prototyping computational-heavy jobs (as in
    scientific development), but hand-crafted solutions to alleviate this
    issue are error-prone and often lead to unreproducible results.

 *  **Persist to disk transparently**: efficiently persisting
    arbitrary objects containing large data is hard. Using
    joblib's caching mechanism avoids hand-written persistence and
    implicitly links the file on disk to the execution context of
    the original Python object. As a result, joblib's persistence is
    good for resuming an application status or computational job, eg
    after a crash.

Joblib addresses these problems while **leaving your code and your flow
control as unmodified as possible** (no framework, no new paradigms).

Main features
------------------

1) **Transparent and fast disk-caching of output value:** a memoize or
   make-like functionality for Python functions that works well for
   arbitrary Python objects, including very large numpy arrays. Separate
   persistence and flow-execution logic from domain logic or algorithmic
   code by writing the operations as a set of steps with well-defined
   inputs and  outputs: Python functions. Joblib can save their
   computation to disk and rerun it only if necessary::

      >>> from joblib import Memory
      >>> location = 'your_cache_dir_goes_here'
      >>> mem = Memory(location, verbose=1)
      >>> import numpy as np
      >>> a = np.vander(np.arange(3)).astype(float)
      >>> square = mem.cache(np.square)
      >>> b = square(a)                                   # doctest: +ELLIPSIS
      ______________________________________________________________________...
      [Memory] Calling ...square...
      square(array([[0., 0., 1.],
             [1., 1., 1.],
             [4., 2., 1.]]))
      _________________________________________________...square - ...s, 0.0min

      >>> c = square(a)
      >>> # The above call did not trigger an evaluation

2) **Embarrassingly parallel helper:** to make it easy to write readable
   parallel code and debug it quickly::

      >>> from joblib import Parallel, delayed
      >>> from math import sqrt
      >>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))
      [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]


3) **Fast compressed Persistence**: a replacement for pickle to work
   efficiently on Python objects containing large data (
   *joblib.dump* & *joblib.load* ).

..
    >>> import shutil ; shutil.rmtree(location)

"""

# PEP0440 compatible formatted version, see:
# https://www.python.org/dev/peps/pep-0440/
#
# Generic release markers:
# X.Y
# X.Y.Z # For bugfix releases
#
# Admissible pre-release markers:
# X.YaN # Alpha release
# X.YbN # Beta release
# X.YrcN # Release Candidate
# X.Y # Final release
#
# Dev branch marker is: 'X.Y.dev' or 'X.Y.devN' where N is an integer.
# 'X.Y.dev0' is the canonical version of 'X.Y.dev'
#
__version__ = "1.5.0"


import os

from ._cloudpickle_wrapper import wrap_non_picklable_objects
from ._parallel_backends import ParallelBackendBase
from ._store_backends import StoreBackendBase
from .compressor import register_compressor
from .hashing import hash
from .logger import Logger, PrintTime
from .memory import MemorizedResult, Memory, expires_after, register_store_backend
from .numpy_pickle import dump, load
from .parallel import (
    Parallel,
    cpu_count,
    delayed,
    effective_n_jobs,
    parallel_backend,
    parallel_config,
    register_parallel_backend,
)

__all__ = [
    # On-disk result caching
    "Memory",
    "MemorizedResult",
    "expires_after",
    # Parallel code execution
    "Parallel",
    "delayed",
    "cpu_count",
    "effective_n_jobs",
    "wrap_non_picklable_objects",
    # Context to change the backend globally
    "parallel_config",
    "parallel_backend",
    # Helpers to define and register store/parallel backends
    "ParallelBackendBase",
    "StoreBackendBase",
    "register_compressor",
    "register_parallel_backend",
    "register_store_backend",
    # Helpers kept for backward compatibility
    "PrintTime",
    "Logger",
    "hash",
    "dump",
    "load",
]


# Workaround issue discovered in intel-openmp 2019.5:
# https://github.com/ContinuumIO/anaconda-issues/issues/11294
os.environ.setdefault("KMP_INIT_AT_FORK", "FALSE")