Sam Chaudry

Upload folder using huggingface_hub

7885a28 verified about 1 month ago

5.34 kB

	"""Joblib is a set of tools to provide **lightweight pipelining in
	Python**. In particular:

	1. transparent disk-caching of functions and lazy re-evaluation
	(memoize pattern)

	2. easy simple parallel computing

	Joblib is optimized to be fast and robust on large
	data in particular and has specific optimizations for `numpy` arrays. It is
	BSD-licensed.


	==================== ===============================================
	Documentation: https://joblib.readthedocs.io

	Download: https://pypi.python.org/pypi/joblib#downloads

	Source code: https://github.com/joblib/joblib

	Report issues: https://github.com/joblib/joblib/issues
	==================== ===============================================


	Vision
	--------

	The vision is to provide tools to easily achieve better performance and
	reproducibility when working with long running jobs.

	* Avoid computing the same thing twice: code is often rerun again and
	again, for instance when prototyping computational-heavy jobs (as in
	scientific development), but hand-crafted solutions to alleviate this
	issue are error-prone and often lead to unreproducible results.

	* Persist to disk transparently: efficiently persisting
	arbitrary objects containing large data is hard. Using
	joblib's caching mechanism avoids hand-written persistence and
	implicitly links the file on disk to the execution context of
	the original Python object. As a result, joblib's persistence is
	good for resuming an application status or computational job, eg
	after a crash.

	Joblib addresses these problems while **leaving your code and your flow
	control as unmodified as possible** (no framework, no new paradigms).

	Main features
	------------------

	1) Transparent and fast disk-caching of output value: a memoize or
	make-like functionality for Python functions that works well for
	arbitrary Python objects, including very large numpy arrays. Separate
	persistence and flow-execution logic from domain logic or algorithmic
	code by writing the operations as a set of steps with well-defined
	inputs and outputs: Python functions. Joblib can save their
	computation to disk and rerun it only if necessary::

	>>> from joblib import Memory
	>>> location = 'your_cache_dir_goes_here'
	>>> mem = Memory(location, verbose=1)
	>>> import numpy as np
	>>> a = np.vander(np.arange(3)).astype(float)
	>>> square = mem.cache(np.square)
	>>> b = square(a) # doctest: +ELLIPSIS
	______________________________________________________________________...
	[Memory] Calling ...square...
	square(array([[0., 0., 1.],
	[1., 1., 1.],
	[4., 2., 1.]]))
	_________________________________________________...square - ...s, 0.0min

	>>> c = square(a)
	>>> # The above call did not trigger an evaluation

	2) Embarrassingly parallel helper: to make it easy to write readable
	parallel code and debug it quickly::

	>>> from joblib import Parallel, delayed
	>>> from math import sqrt
	>>> Parallel(n_jobs=1)(delayed(sqrt)(i**2) for i in range(10))
	[0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0]


	3) Fast compressed Persistence: a replacement for pickle to work
	efficiently on Python objects containing large data (
	joblib.dump & joblib.load ).

	..
	>>> import shutil ; shutil.rmtree(location)

	"""

	# PEP0440 compatible formatted version, see:
	# https://www.python.org/dev/peps/pep-0440/
	#
	# Generic release markers:
	# X.Y
	# X.Y.Z # For bugfix releases
	#
	# Admissible pre-release markers:
	# X.YaN # Alpha release
	# X.YbN # Beta release
	# X.YrcN # Release Candidate
	# X.Y # Final release
	#
	# Dev branch marker is: 'X.Y.dev' or 'X.Y.devN' where N is an integer.
	# 'X.Y.dev0' is the canonical version of 'X.Y.dev'
	#
	__version__ = "1.5.0"


	import os

	from ._cloudpickle_wrapper import wrap_non_picklable_objects
	from ._parallel_backends import ParallelBackendBase
	from ._store_backends import StoreBackendBase
	from .compressor import register_compressor
	from .hashing import hash
	from .logger import Logger, PrintTime
	from .memory import MemorizedResult, Memory, expires_after, register_store_backend
	from .numpy_pickle import dump, load
	from .parallel import (
	Parallel,
	cpu_count,
	delayed,
	effective_n_jobs,
	parallel_backend,
	parallel_config,
	register_parallel_backend,
	)

	__all__ = [
	# On-disk result caching
	"Memory",
	"MemorizedResult",
	"expires_after",
	# Parallel code execution
	"Parallel",
	"delayed",
	"cpu_count",
	"effective_n_jobs",
	"wrap_non_picklable_objects",
	# Context to change the backend globally
	"parallel_config",
	"parallel_backend",
	# Helpers to define and register store/parallel backends
	"ParallelBackendBase",
	"StoreBackendBase",
	"register_compressor",
	"register_parallel_backend",
	"register_store_backend",
	# Helpers kept for backward compatibility
	"PrintTime",
	"Logger",
	"hash",
	"dump",
	"load",
	]


	# Workaround issue discovered in intel-openmp 2019.5:
	# https://github.com/ContinuumIO/anaconda-issues/issues/11294
	os.environ.setdefault("KMP_INIT_AT_FORK", "FALSE")