|
<!--Copyright 2020 The HuggingFace Team. All rights reserved. |
|
|
|
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with |
|
the License. You may obtain a copy of the License at |
|
|
|
http: |
|
|
|
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on |
|
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the |
|
specific language governing permissions and limitations under the License. |
|
--> |
|
|
|
# Pipelines |
|
|
|
The pipelines are a great and easy way to use models for inference. These pipelines are objects that abstract most of |
|
the complex code from the library, offering a simple API dedicated to several tasks, including Named Entity |
|
Recognition, Masked Language Modeling, Sentiment Analysis, Feature Extraction and Question Answering. See the |
|
[task summary](../task_summary) for examples of use. |
|
|
|
There are two categories of pipeline abstractions to be aware about: |
|
|
|
- The [`pipeline`] which is the most powerful object encapsulating all other pipelines. |
|
- Task-specific pipelines are available for [audio](#audio), [computer vision](#computer-vision), [natural language processing](#natural-language-processing), and [multimodal](#multimodal) tasks. |
|
|
|
## The pipeline abstraction |
|
|
|
The *pipeline* abstraction is a wrapper around all the other available pipelines. It is instantiated as any other |
|
pipeline but can provide additional quality of life. |
|
|
|
Simple call on one item: |
|
|
|
```python |
|
>>> pipe = pipeline("text-classification") |
|
>>> pipe("This restaurant is awesome") |
|
[{'label': 'POSITIVE', 'score': 0.9998743534088135}] |
|
``` |
|
|
|
If you want to use a specific model from the [hub](https: |
|
the hub already defines it: |
|
|
|
```python |
|
>>> pipe = pipeline(model="roberta-large-mnli") |
|
>>> pipe("This restaurant is awesome") |
|
[{'label': 'NEUTRAL', 'score': 0.7313136458396912}] |
|
``` |
|
|
|
To call a pipeline on many items, you can call it with a *list*. |
|
|
|
```python |
|
>>> pipe = pipeline("text-classification") |
|
>>> pipe(["This restaurant is awesome", "This restaurant is awful"]) |
|
[{'label': 'POSITIVE', 'score': 0.9998743534088135}, |
|
{'label': 'NEGATIVE', 'score': 0.9996669292449951}] |
|
``` |
|
|
|
To iterate over full datasets it is recommended to use a `dataset` directly. This means you don't need to allocate |
|
the whole dataset at once, nor do you need to do batching yourself. This should work just as fast as custom loops on |
|
GPU. If it doesn't don't hesitate to create an issue. |
|
|
|
```python |
|
import datasets |
|
from transformers import pipeline |
|
from transformers.pipelines.pt_utils import KeyDataset |
|
from tqdm.auto import tqdm |
|
|
|
pipe = pipeline("automatic-speech-recognition", model="facebook/wav2vec2-base-960h", device=0) |
|
dataset = datasets.load_dataset("superb", name="asr", split="test") |
|
|
|
# KeyDataset (only *pt*) will simply return the item in the dict returned by the dataset item |
|
# as we're not interested in the *target* part of the dataset. For sentence pair use KeyPairDataset |
|
for out in tqdm(pipe(KeyDataset(dataset, "file"))): |
|
print(out) |
|
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"} |
|
# {"text": ....} |
|
# .... |
|
``` |
|
|
|
For ease of use, a generator is also possible: |
|
|
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
pipe = pipeline("text-classification") |
|
|
|
|
|
def data(): |
|
while True: |
|
# This could come from a dataset, a database, a queue or HTTP request |
|
# in a server |
|
# Caveat: because this is iterative, you cannot use `num_workers > 1` variable |
|
# to use multiple threads to preprocess data. You can still have 1 thread that |
|
# does the preprocessing while the main runs the big inference |
|
yield "This is a test" |
|
|
|
|
|
for out in pipe(data()): |
|
print(out) |
|
# {"text": "NUMBER TEN FRESH NELLY IS WAITING ON YOU GOOD NIGHT HUSBAND"} |
|
# {"text": ....} |
|
# .... |
|
``` |
|
|
|
[[autodoc]] pipeline |
|
|
|
## Pipeline batching |
|
|
|
All pipelines can use batching. This will work |
|
whenever the pipeline uses its streaming ability (so when passing lists or `Dataset` or `generator`). |
|
|
|
```python |
|
from transformers import pipeline |
|
from transformers.pipelines.pt_utils import KeyDataset |
|
import datasets |
|
|
|
dataset = datasets.load_dataset("imdb", name="plain_text", split="unsupervised") |
|
pipe = pipeline("text-classification", device=0) |
|
for out in pipe(KeyDataset(dataset, "text"), batch_size=8, truncation="only_first"): |
|
print(out) |
|
# [{'label': 'POSITIVE', 'score': 0.9998743534088135}] |
|
# Exactly the same output as before, but the content are passed |
|
# as batches to the model |
|
``` |
|
|
|
<Tip warning={true}> |
|
|
|
However, this is not automatically a win for performance. It can be either a 10x speedup or 5x slowdown depending |
|
on hardware, data and the actual model being used. |
|
|
|
Example where it's mostly a speedup: |
|
|
|
</Tip> |
|
|
|
```python |
|
from transformers import pipeline |
|
from torch.utils.data import Dataset |
|
from tqdm.auto import tqdm |
|
|
|
pipe = pipeline("text-classification", device=0) |
|
|
|
|
|
class MyDataset(Dataset): |
|
def __len__(self): |
|
return 5000 |
|
|
|
def __getitem__(self, i): |
|
return "This is a test" |
|
|
|
|
|
dataset = MyDataset() |
|
|
|
for batch_size in [1, 8, 64, 256]: |
|
print("-" * 30) |
|
print(f"Streaming batch_size={batch_size}") |
|
for out in tqdm(pipe(dataset, batch_size=batch_size), total=len(dataset)): |
|
pass |
|
``` |
|
|
|
``` |
|
# On GTX 970 |
|
------------------------------ |
|
Streaming no batching |
|
100%|██████████████████████████████████████████████████████████████████████| 5000/5000 [00:26<00:00, 187.52it/s] |
|
------------------------------ |
|
Streaming batch_size=8 |
|
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:04<00:00, 1205.95it/s] |
|
------------------------------ |
|
Streaming batch_size=64 |
|
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:02<00:00, 2478.24it/s] |
|
------------------------------ |
|
Streaming batch_size=256 |
|
100%|█████████████████████████████████████████████████████████████████████| 5000/5000 [00:01<00:00, 2554.43it/s] |
|
(diminishing returns, saturated the GPU) |
|
``` |
|
|
|
Example where it's most a slowdown: |
|
|
|
```python |
|
class MyDataset(Dataset): |
|
def __len__(self): |
|
return 5000 |
|
|
|
def __getitem__(self, i): |
|
if i % 64 == 0: |
|
n = 100 |
|
else: |
|
n = 1 |
|
return "This is a test" * n |
|
``` |
|
|
|
This is a occasional very long sentence compared to the other. In that case, the **whole** batch will need to be 400 |
|
tokens long, so the whole batch will be [64, 400] instead of [64, 4], leading to the high slowdown. Even worse, on |
|
bigger batches, the program simply crashes. |
|
|
|
|
|
``` |
|
------------------------------ |
|
Streaming no batching |
|
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:05<00:00, 183.69it/s] |
|
------------------------------ |
|
Streaming batch_size=8 |
|
100%|█████████████████████████████████████████████████████████████████████| 1000/1000 [00:03<00:00, 265.74it/s] |
|
------------------------------ |
|
Streaming batch_size=64 |
|
100%|██████████████████████████████████████████████████████████████████████| 1000/1000 [00:26<00:00, 37.80it/s] |
|
------------------------------ |
|
Streaming batch_size=256 |
|
0%| | 0/1000 [00:00<?, ?it/s] |
|
Traceback (most recent call last): |
|
File "/home/nicolas/src/transformers/test.py", line 42, in <module> |
|
for out in tqdm(pipe(dataset, batch_size=256), total=len(dataset)): |
|
.... |
|
q = q / math.sqrt(dim_per_head) # (bs, n_heads, q_length, dim_per_head) |
|
RuntimeError: CUDA out of memory. Tried to allocate 376.00 MiB (GPU 0; 3.95 GiB total capacity; 1.72 GiB already allocated; 354.88 MiB free; 2.46 GiB reserved in total by PyTorch) |
|
``` |
|
|
|
There are no good (general) solutions for this problem, and your mileage may vary depending on your use cases. Rule of |
|
thumb: |
|
|
|
For users, a rule of thumb is: |
|
|
|
- **Measure performance on your load, with your hardware. Measure, measure, and keep measuring. Real numbers are the |
|
only way to go.** |
|
- If you are latency constrained (live product doing inference), don't batch |
|
- If you are using CPU, don't batch. |
|
- If you are using throughput (you want to run your model on a bunch of static data), on GPU, then: |
|
|
|
- If you have no clue about the size of the sequence_length ("natural" data), by default don't batch, measure and |
|
try tentatively to add it, add OOM checks to recover when it will fail (and it will at some point if you don't |
|
control the sequence_length.) |
|
- If your sequence_length is super regular, then batching is more likely to be VERY interesting, measure and push |
|
it until you get OOMs. |
|
- The larger the GPU the more likely batching is going to be more interesting |
|
- As soon as you enable batching, make sure you can handle OOMs nicely. |
|
|
|
## Pipeline chunk batching |
|
|
|
`zero-shot-classification` and `question-answering` are slightly specific in the sense, that a single input might yield |
|
multiple forward pass of a model. Under normal circumstances, this would yield issues with `batch_size` argument. |
|
|
|
In order to circumvent this issue, both of these pipelines are a bit specific, they are `ChunkPipeline` instead of |
|
regular `Pipeline`. In short: |
|
|
|
|
|
```python |
|
preprocessed = pipe.preprocess(inputs) |
|
model_outputs = pipe.forward(preprocessed) |
|
outputs = pipe.postprocess(model_outputs) |
|
``` |
|
|
|
Now becomes: |
|
|
|
|
|
```python |
|
all_model_outputs = [] |
|
for preprocessed in pipe.preprocess(inputs): |
|
model_outputs = pipe.forward(preprocessed) |
|
all_model_outputs.append(model_outputs) |
|
outputs = pipe.postprocess(all_model_outputs) |
|
``` |
|
|
|
This should be very transparent to your code because the pipelines are used in |
|
the same way. |
|
|
|
This is a simplified view, since the pipeline can handle automatically the batch to ! Meaning you don't have to care |
|
about how many forward passes you inputs are actually going to trigger, you can optimize the `batch_size` |
|
independently of the inputs. The caveats from the previous section still apply. |
|
|
|
## Pipeline custom code |
|
|
|
If you want to override a specific pipeline. |
|
|
|
Don't hesitate to create an issue for your task at hand, the goal of the pipeline is to be easy to use and support most |
|
cases, so `transformers` could maybe support your use case. |
|
|
|
|
|
If you want to try simply you can: |
|
|
|
- Subclass your pipeline of choice |
|
|
|
```python |
|
class MyPipeline(TextClassificationPipeline): |
|
def postprocess(): |
|
# Your code goes here |
|
scores = scores * 100 |
|
# And here |
|
|
|
|
|
my_pipeline = MyPipeline(model=model, tokenizer=tokenizer, ...) |
|
# or if you use *pipeline* function, then: |
|
my_pipeline = pipeline(model="xxxx", pipeline_class=MyPipeline) |
|
``` |
|
|
|
That should enable you to do all the custom code you want. |
|
|
|
|
|
## Implementing a pipeline |
|
|
|
[Implementing a new pipeline](../add_new_pipeline) |
|
|
|
## Audio |
|
|
|
Pipelines available for audio tasks include the following. |
|
|
|
### AudioClassificationPipeline |
|
|
|
[[autodoc]] AudioClassificationPipeline |
|
- __call__ |
|
- all |
|
|
|
### AutomaticSpeechRecognitionPipeline |
|
|
|
[[autodoc]] AutomaticSpeechRecognitionPipeline |
|
- __call__ |
|
- all |
|
|
|
### ZeroShotAudioClassificationPipeline |
|
|
|
[[autodoc]] ZeroShotAudioClassificationPipeline |
|
- __call__ |
|
- all |
|
|
|
## Computer vision |
|
|
|
Pipelines available for computer vision tasks include the following. |
|
|
|
### DepthEstimationPipeline |
|
[[autodoc]] DepthEstimationPipeline |
|
- __call__ |
|
- all |
|
|
|
### ImageClassificationPipeline |
|
|
|
[[autodoc]] ImageClassificationPipeline |
|
- __call__ |
|
- all |
|
|
|
### ImageSegmentationPipeline |
|
|
|
[[autodoc]] ImageSegmentationPipeline |
|
- __call__ |
|
- all |
|
|
|
### ObjectDetectionPipeline |
|
|
|
[[autodoc]] ObjectDetectionPipeline |
|
- __call__ |
|
- all |
|
|
|
### VideoClassificationPipeline |
|
|
|
[[autodoc]] VideoClassificationPipeline |
|
- __call__ |
|
- all |
|
|
|
### ZeroShotImageClassificationPipeline |
|
|
|
[[autodoc]] ZeroShotImageClassificationPipeline |
|
- __call__ |
|
- all |
|
|
|
### ZeroShotObjectDetectionPipeline |
|
|
|
[[autodoc]] ZeroShotObjectDetectionPipeline |
|
- __call__ |
|
- all |
|
|
|
## Natural Language Processing |
|
|
|
Pipelines available for natural language processing tasks include the following. |
|
|
|
### ConversationalPipeline |
|
|
|
[[autodoc]] Conversation |
|
|
|
[[autodoc]] ConversationalPipeline |
|
- __call__ |
|
- all |
|
|
|
### FillMaskPipeline |
|
|
|
[[autodoc]] FillMaskPipeline |
|
- __call__ |
|
- all |
|
|
|
### NerPipeline |
|
|
|
[[autodoc]] NerPipeline |
|
|
|
See [`TokenClassificationPipeline`] for all details. |
|
|
|
### QuestionAnsweringPipeline |
|
|
|
[[autodoc]] QuestionAnsweringPipeline |
|
- __call__ |
|
- all |
|
|
|
### SummarizationPipeline |
|
|
|
[[autodoc]] SummarizationPipeline |
|
- __call__ |
|
- all |
|
|
|
### TableQuestionAnsweringPipeline |
|
|
|
[[autodoc]] TableQuestionAnsweringPipeline |
|
- __call__ |
|
|
|
### TextClassificationPipeline |
|
|
|
[[autodoc]] TextClassificationPipeline |
|
- __call__ |
|
- all |
|
|
|
### TextGenerationPipeline |
|
|
|
[[autodoc]] TextGenerationPipeline |
|
- __call__ |
|
- all |
|
|
|
### Text2TextGenerationPipeline |
|
|
|
[[autodoc]] Text2TextGenerationPipeline |
|
- __call__ |
|
- all |
|
|
|
### TokenClassificationPipeline |
|
|
|
[[autodoc]] TokenClassificationPipeline |
|
- __call__ |
|
- all |
|
|
|
### TranslationPipeline |
|
|
|
[[autodoc]] TranslationPipeline |
|
- __call__ |
|
- all |
|
|
|
### ZeroShotClassificationPipeline |
|
|
|
[[autodoc]] ZeroShotClassificationPipeline |
|
- __call__ |
|
- all |
|
|
|
## Multimodal |
|
|
|
Pipelines available for multimodal tasks include the following. |
|
|
|
### DocumentQuestionAnsweringPipeline |
|
|
|
[[autodoc]] DocumentQuestionAnsweringPipeline |
|
- __call__ |
|
- all |
|
|
|
### FeatureExtractionPipeline |
|
|
|
[[autodoc]] FeatureExtractionPipeline |
|
- __call__ |
|
- all |
|
|
|
### ImageToTextPipeline |
|
|
|
[[autodoc]] ImageToTextPipeline |
|
- __call__ |
|
- all |
|
|
|
### VisualQuestionAnsweringPipeline |
|
|
|
[[autodoc]] VisualQuestionAnsweringPipeline |
|
- __call__ |
|
- all |
|
|
|
## Parent class: `Pipeline` |
|
|
|
[[autodoc]] Pipeline |
|
|