GAMA-IT

Running on Zero

App Files Files Community

GAMA-IT / hf /transformers /docs /source /en /main_classes /pipelines.mdx

sonalkum

bug fix

fa57c60 about 1 year ago

raw

history blame

14.8 kB

	<!--Copyright 2020 The HuggingFace Team. All rights reserved.

	Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
	the License. You may obtain a copy of the License at

	http://www.apache.org/licenses/LICENSE-2.0

	Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
	an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
	specific language governing permissions and limitations under the License.
	-->

	# Pipelines

	The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of
	the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity
	Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the
	[task summary](../task_summary) for examples of use.

	There are two categories of pipeline abstractions to be aware about:

	- The [`pipeline`] which is the most powerful object encapsulating all other pipelines.
	- Task-specific pipelines are available for [audio](#audio), [computer vision](#computer-vision), [natural language processing](#natural-language-processing), and [multimodal](#multimodal) tasks.

	## The pipeline abstraction

	The pipeline abstraction is a wrapper around all the other available pipelines. It is instantiated as any other
	pipeline but can provide additional quality of life.

	Simple call on one item:

	```python
	>>> pipe = pipeline("text-classification")
	>>> pipe("This restaurant is awesome")
	[{'label': 'POSITIVE', 'score': 0.9998743534088135}]
	```

	If you want to use a specific model from the [hub](https://huggingface.co) you can ignore the task if the model on
	the hub already defines it:

	```python
	>>> pipe = pipeline(model="roberta-large-mnli")
	>>> pipe("This restaurant is awesome")
	[{'label': 'NEUTRAL', 'score': 0.7313136458396912}]
	```

	To call a pipeline on many items, you can call it with a list.

	```python
	>>> pipe = pipeline("text-classification")
	>>> pipe(["This restaurant is awesome", "This restaurant is awful"])
	[{'label': 'POSITIVE', 'score': 0.9998743534088135},
	{'label': 'NEGATIVE', 'score': 0.9996669292449951}]
	```

	To iterate over full datasets it is recommended to use a `dataset` directly. This means you don't need to allocate
	the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on
	GPU. If it doesn't don't hesitate to create an issue.

	```python
	import datasets
	from transformers import pipeline
	from transformers.pipelines.pt_utils import KeyDataset
	from tqdm.auto import tqdm

	pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0)
	dataset = datasets.load_dataset("superb", name="asr", split="test")

	# KeyDataset (only pt) will simply return the item in the dict returned by the dataset item
	# as we're not interested in the target part of the dataset. For sentence pair use KeyPairDataset
	for out in tqdm(pipe(KeyDataset(dataset, "file"))):
	print(out)
	# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
	# {"text": ....}
	# ....
	```

	For ease of use, a generator is also possible:


	```python
	from transformers import pipeline

	pipe = pipeline("text-classification")


	def data():
	while True:
	# This could come from a dataset, a database, a queue or HTTP request
	# in a server
	# Caveat: because this is iterative, you cannot use `num_workers > 1` variable
	# to use multiple threads to preprocess data. You can still have 1 thread that
	# does the preprocessing while the main runs the big inference
	yield "This is a test"


	for out in pipe(data()):
	print(out)
	# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"}
	# {"text": ....}
	# ....
	```

	[[autodoc]] pipeline

	## Pipeline batching

	All pipelines can use batching. This will work
	whenever the pipeline uses its streaming ability (so when passing lists or `Dataset` or `generator`).

	```python
	from transformers import pipeline
	from transformers.pipelines.pt_utils import KeyDataset
	import datasets

	dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised")
	pipe = pipeline("text-classification", device=0)
	for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"):
	print(out)
	# [{'label': 'POSITIVE', 'score': 0.9998743534088135}]
	# Exactly the same output as before, but the content are passed
	# as batches to the model
	```

	<Tip warning={true}>

	However, this is not automatically a win for performance. It can be either a 10x speedup or 5x slowdown depending
	on hardware, data and the actual model being used.

	Example where it's mostly a speedup:

	</Tip>

	```python
	from transformers import pipeline
	from torch.utils.data import Dataset
	from tqdm.auto import tqdm

	pipe = pipeline("text-classification", device=0)


	class MyDataset(Dataset):
	def __len__(self):
	return 5000

	def __getitem__(self, i):
	return "This is a test"


	dataset = MyDataset()

	for batch_size in [1, 8, 64, 256]:
	print("-" * 30)
	print(f"Streaming batch_size={batch_size}")
	for out in tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)):
	pass
	```

	```
	# On GTX 970
	------------------------------
	Streaming no batching
	100%\|██████████████████████████████████████████████████████████████████████\| 5000/5000 [00:26<00:00, 187.52it/s]
	------------------------------
	Streaming batch_size=8
	100%\|█████████████████████████████████████████████████████████████████████\| 5000/5000 [00:04<00:00, 1205.95it/s]
	------------------------------
	Streaming batch_size=64
	100%\|█████████████████████████████████████████████████████████████████████\| 5000/5000 [00:02<00:00, 2478.24it/s]
	------------------------------
	Streaming batch_size=256
	100%\|█████████████████████████████████████████████████████████████████████\| 5000/5000 [00:01<00:00, 2554.43it/s]
	(diminishing returns, saturated the GPU)
	```

	Example where it's most a slowdown:

	```python
	class MyDataset(Dataset):
	def __len__(self):
	return 5000

	def __getitem__(self, i):
	if i % 64 == 0:
	n = 100
	else:
	n = 1
	return "This is a test" * n
	```

	This is a occasional very long sentence compared to the other. In that case, the whole batch will need to be 400
	tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown. Even worse, on
	bigger batches, the program simply crashes.


	```
	------------------------------
	Streaming no batching
	100%\|█████████████████████████████████████████████████████████████████████\| 1000/1000 [00:05<00:00, 183.69it/s]
	------------------------------
	Streaming batch_size=8
	100%\|█████████████████████████████████████████████████████████████████████\| 1000/1000 [00:03<00:00, 265.74it/s]
	------------------------------
	Streaming batch_size=64
	100%\|██████████████████████████████████████████████████████████████████████\| 1000/1000 [00:26<00:00, 37.80it/s]
	------------------------------
	Streaming batch_size=256
	0%\| \| 0/1000 [00:00<?, ?it/s]
	Traceback (most recent call last):
	File "/home/nicolas/src/transformers/test.py", line 42, in <module>
	for out in tqdm(pipe(dataset, batch_size=256), total=len(dataset)):
	....
	q = q / math.sqrt(dim_per_head) # (bs, n_heads, q_length, dim_per_head)
	RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 3.95 GiB total capacity; 1.72 GiB already allocated; 354.88 MiB free; 2.46 GiB reserved in total by PyTorch)
	```

	There are no good (general) solutions for this problem, and your mileage may vary depending on your use cases. Rule of
	thumb:

	For users, a rule of thumb is:

	- **Measure performance on your load, with your hardware. Measure, measure, and keep measuring. Real numbers are the
	only way to go.**
	- If you are latency constrained (live product doing inference), don't batch
	- If you are using CPU, don't batch.
	- If you are using throughput (you want to run your model on a bunch of static data), on GPU, then:

	- If you have no clue about the size of the sequence_length ("natural" data), by default don't batch, measure and
	try tentatively to add it, add OOM checks to recover when it will fail (and it will at some point if you don't
	control the sequence_length.)
	- If your sequence_length is super regular, then batching is more likely to be VERY interesting, measure and push
	it until you get OOMs.
	- The larger the GPU the more likely batching is going to be more interesting
	- As soon as you enable batching, make sure you can handle OOMs nicely.

	## Pipeline chunk batching

	`zero-shot-classification` and `question-answering` are slightly specific in the sense, that a single input might yield
	multiple forward pass of a model. Under normal circumstances, this would yield issues with `batch_size` argument.

	In order to circumvent this issue, both of these pipelines are a bit specific, they are `ChunkPipeline` instead of
	regular `Pipeline`. In short:


	```python
	preprocessed = pipe.preprocess(inputs)
	model_outputs = pipe.forward(preprocessed)
	outputs = pipe.postprocess(model_outputs)
	```

	Now becomes:


	```python
	all_model_outputs = []
	for preprocessed in pipe.preprocess(inputs):
	model_outputs = pipe.forward(preprocessed)
	all_model_outputs.append(model_outputs)
	outputs = pipe.postprocess(all_model_outputs)
	```

	This should be very transparent to your code because the pipelines are used in
	the same way.

	This is a simplified view, since the pipeline can handle automatically the batch to ! Meaning you don't have to care
	about how many forward passes you inputs are actually going to trigger, you can optimize the `batch_size`
	independently of the inputs. The caveats from the previous section still apply.

	## Pipeline custom code

	If you want to override a specific pipeline.

	Don't hesitate to create an issue for your task at hand, the goal of the pipeline is to be easy to use and support most
	cases, so `transformers` could maybe support your use case.


	If you want to try simply you can:

	- Subclass your pipeline of choice

	```python
	class MyPipeline(TextClassificationPipeline):
	def postprocess():
	# Your code goes here
	scores = scores * 100
	# And here


	my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...)
	# or if you use pipeline function, then:
	my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline)
	```

	That should enable you to do all the custom code you want.


	## Implementing a pipeline

	[Implementing a new pipeline](../add_new_pipeline)

	## Audio

	Pipelines available for audio tasks include the following.

	### AudioClassificationPipeline

	[[autodoc]] AudioClassificationPipeline
	- __call__
	- all

	### AutomaticSpeechRecognitionPipeline

	[[autodoc]] AutomaticSpeechRecognitionPipeline
	- __call__
	- all

	### ZeroShotAudioClassificationPipeline

	[[autodoc]] ZeroShotAudioClassificationPipeline
	- __call__
	- all

	## Computer vision

	Pipelines available for computer vision tasks include the following.

	### DepthEstimationPipeline
	[[autodoc]] DepthEstimationPipeline
	- __call__
	- all

	### ImageClassificationPipeline

	[[autodoc]] ImageClassificationPipeline
	- __call__
	- all

	### ImageSegmentationPipeline

	[[autodoc]] ImageSegmentationPipeline
	- __call__
	- all

	### ObjectDetectionPipeline

	[[autodoc]] ObjectDetectionPipeline
	- __call__
	- all

	### VideoClassificationPipeline

	[[autodoc]] VideoClassificationPipeline
	- __call__
	- all

	### ZeroShotImageClassificationPipeline

	[[autodoc]] ZeroShotImageClassificationPipeline
	- __call__
	- all

	### ZeroShotObjectDetectionPipeline

	[[autodoc]] ZeroShotObjectDetectionPipeline
	- __call__
	- all

	## Natural Language Processing

	Pipelines available for natural language processing tasks include the following.

	### ConversationalPipeline

	[[autodoc]] Conversation

	[[autodoc]] ConversationalPipeline
	- __call__
	- all

	### FillMaskPipeline

	[[autodoc]] FillMaskPipeline
	- __call__
	- all

	### NerPipeline

	[[autodoc]] NerPipeline

	See [`TokenClassificationPipeline`] for all details.

	### QuestionAnsweringPipeline

	[[autodoc]] QuestionAnsweringPipeline
	- __call__
	- all

	### SummarizationPipeline

	[[autodoc]] SummarizationPipeline
	- __call__
	- all

	### TableQuestionAnsweringPipeline

	[[autodoc]] TableQuestionAnsweringPipeline
	- __call__

	### TextClassificationPipeline

	[[autodoc]] TextClassificationPipeline
	- __call__
	- all

	### TextGenerationPipeline

	[[autodoc]] TextGenerationPipeline
	- __call__
	- all

	### Text2TextGenerationPipeline

	[[autodoc]] Text2TextGenerationPipeline
	- __call__
	- all

	### TokenClassificationPipeline

	[[autodoc]] TokenClassificationPipeline
	- __call__
	- all

	### TranslationPipeline

	[[autodoc]] TranslationPipeline
	- __call__
	- all

	### ZeroShotClassificationPipeline

	[[autodoc]] ZeroShotClassificationPipeline
	- __call__
	- all

	## Multimodal

	Pipelines available for multimodal tasks include the following.

	### DocumentQuestionAnsweringPipeline

	[[autodoc]] DocumentQuestionAnsweringPipeline
	- __call__
	- all

	### FeatureExtractionPipeline

	[[autodoc]] FeatureExtractionPipeline
	- __call__
	- all

	### ImageToTextPipeline

	[[autodoc]] ImageToTextPipeline
	- __call__
	- all

	### VisualQuestionAnsweringPipeline

	[[autodoc]] VisualQuestionAnsweringPipeline
	- __call__
	- all

	## Parent class: `Pipeline`

	[[autodoc]] Pipeline