Spaces:
Runtime error
Runtime error
File size: 10,314 Bytes
96e9536 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 |
<!--Copyright 2023 The HuggingFace Team. All rights reserved.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the
specific language governing permissions and limitations under the License.
β οΈ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be
rendered properly in your Markdown viewer.
-->
# TensorFlow λͺ¨λΈμ μν XLA ν΅ν© [[xla-integration-for-tensorflow-models]]
[[open-in-colab]]
XLA(Accelerated Linear Algebra)λ TensorFlow λͺ¨λΈμ μ€ν μκ°μ κ°μννκΈ° μν μ»΄νμΌλ¬μ
λλ€. [곡μ λ¬Έμ](https://www.tensorflow.org/xla)μ λ°λ₯΄λ©΄ λ€μκ³Ό κ°μ΅λλ€:
XLA(Accelerated Linear Algebra)λ μ ν λμλ₯Ό μν λλ©μΈ νΉν μ»΄νμΌλ¬λ‘, TensorFlow λͺ¨λΈμ μμ€ μ½λ λ³κ²½ μμ΄ κ°μνν μ μμ΅λλ€.
TensorFlowμμ XLAλ₯Ό μ¬μ©νλ κ²μ κ°λ¨ν©λλ€. XLAλ `tensorflow` λΌμ΄λΈλ¬λ¦¬ λ΄μ ν¨ν€μ§λ‘ μ 곡λλ©°, [`tf.function`](https://www.tensorflow.org/guide/intro_to_graphs)κ³Ό κ°μ κ·Έλν μμ± ν¨μμμ `jit_compile` μΈμλ₯Ό μ¬μ©νμ¬ νμ±νν μ μμ΅λλ€. `fit()` λ° `predict()`μ κ°μ Keras λ©μλλ₯Ό μ¬μ©νλ κ²½μ°, `jit_compile` μΈμλ₯Ό `model.compile()`μ μ λ¬νμ¬ XLAλ₯Ό κ°λ¨νκ² νμ±νν μ μμ΅λλ€. κ·Έλ¬λ XLAλ μ΄λ¬ν λ©μλμ κ΅νλμ§ μκ³ μμμ `tf.function`μ κ°μννλ λ°μλ μ¬μ©ν μ μμ΅λλ€.
π€ Transformersμμλ [GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2), [T5](https://huggingface.co/docs/transformers/model_doc/t5), [OPT](https://huggingface.co/docs/transformers/model_doc/opt)μ κ°μ λͺ¨λΈμ ν
μ€νΈ μμ±, κ·Έλ¦¬κ³ [Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)μ κ°μ λͺ¨λΈμ μμ± μ²λ¦¬λ₯Ό ν¬ν¨νμ¬ μ¬λ¬ TensorFlow λ©μλκ° XLAμ νΈνλλλ‘ λ€μ μμ±λμμ΅λλ€.
μ νν μλ ν₯μμ λͺ¨λΈμ λ°λΌ λ€λ₯΄μ§λ§, π€ Transformers λ΄μ TensorFlow ν
μ€νΈ μμ± λͺ¨λΈμ κ²½μ° μ΅λ 100λ°°μ μλ ν₯μμ νμΈνμ΅λλ€. μ΄ λ¬Έμμμλ μ΄λ¬ν λͺ¨λΈμ λν΄ XLAλ₯Ό μ¬μ©νμ¬ μ΅λ μ±λ₯μ μ»λ λ°©λ²μ μ€λͺ
ν©λλ€. λν XLA ν΅ν©μ λ²€μΉλ§ν¬ λ° λμμΈ μ² νμ λν μΆκ° μλ£ λ§ν¬λ μ 곡ν κ²μ
λλ€.
## XLAλ₯Ό μ¬μ©νμ¬ TF ν¨μ μ€ννκΈ° [[running-tf-functions-with-xla]]
TensorFlowμμ λ€μκ³Ό κ°μ λͺ¨λΈμ κ³ λ €ν΄ λ΄
μλ€:
```py
import tensorflow as tf
model = tf.keras.Sequential(
[tf.keras.layers.Dense(10, input_shape=(10,), activation="relu"), tf.keras.layers.Dense(5, activation="softmax")]
)
```
μ λͺ¨λΈμ μ°¨μμ΄ `(10, )`μΈ μ
λ ₯μ λ°μ΅λλ€. λ€μκ³Ό κ°μ΄ λͺ¨λΈμ μ¬μ©νμ¬ μμ νλ₯Ό μ€νν μ μμ΅λλ€:
```py
# λͺ¨λΈμ λν μμμ μ
λ ₯μ μμ±ν©λλ€.
batch_size = 16
input_vector_dim = 10
random_inputs = tf.random.normal((batch_size, input_vector_dim))
# μμ νλ₯Ό μ€νν©λλ€.
_ = model(random_inputs)
```
XLAλ‘ μ»΄νμΌλ ν¨μλ‘ μμ νλ₯Ό μ€ννλ €λ©΄ λ€μκ³Ό κ°μ΄ ν΄μΌ ν©λλ€:
```py
xla_fn = tf.function(model, jit_compile=True)
_ = xla_fn(random_inputs)
```
`model`μ κΈ°λ³Έ `call()` ν¨μλ XLA κ·Έλνλ₯Ό μ»΄νμΌνλ λ° μ¬μ©λ©λλ€. κ·Έλ¬λ λ€λ₯Έ λͺ¨λΈ ν¨μλ₯Ό XLAλ‘ μ»΄νμΌνλ €λ©΄ λ€μκ³Ό κ°μ΄ ν μλ μμ΅λλ€:
```py
my_xla_fn = tf.function(model.my_xla_fn, jit_compile=True)
```
## π€ Transformersμμ XLAλ₯Ό μ¬μ©νμ¬ TF ν
μ€νΈ μμ± λͺ¨λΈ μ€ννκΈ° [[running-a-tf-text-generation-model-with-xla-from-transformers]]
π€ Transformersμμ XLAλ‘ κ°μνλ μμ±μ νμ±ννλ €λ©΄ μ΅μ λ²μ μ `transformers`κ° μ€μΉλμ΄ μμ΄μΌ ν©λλ€. λ€μκ³Ό κ°μ΄ μ€μΉν μ μμ΅λλ€:
```bash
pip install transformers --upgrade
```
κ·Έλ¦¬κ³ λ€μ μ½λλ₯Ό μ€νν μ μμ΅λλ€:
```py
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForCausalLM
# μ΅μ λ²μ μ Transformersκ° μ€μΉλμ΄ μμ§ μλ€λ©΄ μ€λ₯κ° λ°μν©λλ€.
from transformers.utils import check_min_version
check_min_version("4.21.0")
tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="</s>")
model = TFAutoModelForCausalLM.from_pretrained("gpt2")
input_string = ["TensorFlow is"]
# XLA μμ± ν¨μλ₯Ό λ§λ€κΈ° μν ν μ€
xla_generate = tf.function(model.generate, jit_compile=True)
tokenized_input = tokenizer(input_string, return_tensors="tf")
generated_tokens = xla_generate(**tokenized_input, num_beams=2)
decoded_text = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)
print(f"Generated -- {decoded_text}")
# Generated -- TensorFlow is an open-source, open-source, distributed-source application # framework for the
```
μ μ μλ―μ΄, `generate()`μμ XLAλ₯Ό νμ±ννλ κ²μ λ¨ ν μ€μ μ½λμ
λλ€. μ½λμ λλ¨Έμ§ λΆλΆμ λ³κ²½λμ§ μμ΅λλ€. κ·Έλ¬λ μ μ½λ μ€λν«μμλ XLAμ νΉμ ν λͺ κ°μ§ μ£Όμν μ μ΄ μμ΅λλ€. XLAκ° κ°μ Έλ€μ€ μλ ν₯μμ μ€ννκΈ° μν΄μλ μ΄λ₯Ό μκ³ μμ΄μΌ ν©λλ€. λ€μ μΉμ
μμ μ΄μ λν΄ λ
Όμν©λλ€.
## μ£Όμν μ [[gotchas-to-be-aware-of]]
XLA νμ±ν ν¨μ(`xla_generate()`μ κ°μ)λ₯Ό μ²μ μ€νν λ λ΄λΆμ μΌλ‘ κ³μ° κ·Έλνλ₯Ό μΆλ‘ νλ €κ³ νλ©°, μ΄λ μκ°μ΄ μμλ©λλ€. μ΄ κ³Όμ μ [βμΆμ (tracing)β](https://www.tensorflow.org/guide/intro_to_graphs#when_is_a_function_tracing)μ΄λΌκ³ μλ €μ Έ μμ΅λλ€.
μμ± μκ°μ΄ λΉ λ₯΄μ§ μλ€λ κ²μ μ μ μμ κ²μ
λλ€. `xla_generate()`(λλ λ€λ₯Έ XLA νμ±ν ν¨μ)μ μ°μ νΈμΆμ ν¨μμ μ λ¬λ μ
λ ₯μ΄ μ΄κΈ°μ ꡬμΆλ κ³μ° κ·Έλνμ λμΌν ννλ₯Ό λ°λ₯Έλ€λ©΄, κ³μ° κ·Έλνλ₯Ό μΆλ‘ ν νμκ° μμ΅λλ€. μ΄λ μ
λ ₯ ννκ° κ³ μ λ λͺ¨λ¬λ¦¬ν°(μ: μ΄λ―Έμ§)μλ λ¬Έμ κ° λμ§ μμ§λ§, κ°λ³ μ
λ ₯ νν λͺ¨λ¬λ¦¬ν°(μ: ν
μ€νΈ)λ₯Ό μ¬μ©ν λ μ£Όμν΄μΌ ν©λλ€.
`xla_generate()`κ° νμ λμΌν μ
λ ₯ ννλ‘ λμνλλ‘ νλ €λ©΄, ν ν¬λμ΄μ λ₯Ό νΈμΆν λ `padding` μΈμλ₯Ό μ§μ ν μ μμ΅λλ€.
```py
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="</s>")
model = TFAutoModelForCausalLM.from_pretrained("gpt2")
input_string = ["TensorFlow is"]
xla_generate = tf.function(model.generate, jit_compile=True)
# μ¬κΈ°μ, padding μ΅μ
μ΄ μλ ν ν¬λμ΄μ λ₯Ό νΈμΆν©λλ€.
tokenized_input = tokenizer(input_string, pad_to_multiple_of=8, padding=True, return_tensors="tf")
generated_tokens = xla_generate(**tokenized_input, num_beams=2)
decoded_text = tokenizer.decode(generated_tokens[0], skip_special_tokens=True)
print(f"Generated -- {decoded_text}")
```
μ΄λ κ² νλ©΄ `xla_generate()`μ λν μ
λ ₯μ΄ νμ μΆμ λ ννλ‘ μ λ¬λμ΄ μμ± μκ°μ΄ κ°μνλ©λλ€. λ€μ μ½λλ‘ μ΄λ₯Ό νμΈν μ μμ΅λλ€:
```py
import time
import tensorflow as tf
from transformers import AutoTokenizer, TFAutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="</s>")
model = TFAutoModelForCausalLM.from_pretrained("gpt2")
xla_generate = tf.function(model.generate, jit_compile=True)
for input_string in ["TensorFlow is", "TensorFlow is a", "TFLite is a"]:
tokenized_input = tokenizer(input_string, pad_to_multiple_of=8, padding=True, return_tensors="tf")
start = time.time_ns()
generated_tokens = xla_generate(**tokenized_input, num_beams=2)
end = time.time_ns()
print(f"Execution time -- {(end - start) / 1e6:.1f} ms\n")
```
Tesla T4 GPUμμλ λ€μκ³Ό κ°μ μΆλ ₯μ μμν μ μμ΅λλ€:
```bash
Execution time -- 30819.6 ms
Execution time -- 79.0 ms
Execution time -- 78.9 ms
```
`xla_generate()`μ 첫 λ²μ§Έ νΈμΆμ μΆμ λλ¬Έμ μκ°μ΄ μ€λ 걸리μ§λ§, μ°μ νΈμΆμ λͺ λ°°λ λΉ λ¦
λλ€. μμ± μ΅μ
μ λν μ΄λ€ λ³κ²½μ΄λ λ€μ μΆμ μ μ λ°νλ―λ‘ μμ± μκ°μ΄ λλ €μ§ μ μμμ λͺ
μ¬νμΈμ.
μ΄ λ¬Έμμμλ π€ Transformersμμ μ 곡νλ λͺ¨λ ν
μ€νΈ μμ± μ΅μ
μ λ€λ£¨μ§ μμμ΅λλ€. κ³ κΈ μ¬μ© μ¬λ‘μ λν΄ λ¬Έμλ₯Ό μ°Έμ‘°νμκΈ° λ°λλλ€.
## μΆκ° μλ£ [[additional-resources]]
μ¬κΈ°μ π€ Transformersμ XLAμ λν΄ λ μμΈν μκ³ μΆμ κ²½μ° λμμ΄ λ μ μλ λͺ κ°μ§ μΆκ° μλ£λ₯Ό μ 곡ν©λλ€.
* [μ΄ Colab λ
ΈνΈλΆ](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/91_tf_xla_generate.ipynb)μ XLAμ νΈνλλ μΈμ½λ-λμ½λ([T5](https://huggingface.co/docs/transformers/model_doc/t5)μ κ°μ) λ° λμ½λ μ μ©([GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2)μ κ°μ) ν
μ€νΈ μμ± λͺ¨λΈμ μ€νν΄ λ³Ό μ μλ λνν λ°λͺ¨λ₯Ό μ 곡ν©λλ€.
* [μ΄ λΈλ‘κ·Έ κΈ](https://huggingface.co/blog/tf-xla-generate)μ TensorFlowμμ XLAμ λν μΉμ ν μκ°μ ν¨κ» XLAμ νΈνλλ λͺ¨λΈμ λΉκ΅ λ²€μΉλ§ν¬μ λν κ°μλ₯Ό μ 곡ν©λλ€.
* [μ΄ λΈλ‘κ·Έ κΈ](https://blog.tensorflow.org/2022/11/how-hugging-face-improved-text-generation-performance-with-xla.html)μ π€ Transformersμ TensorFlow λͺ¨λΈμ XLA μ§μμ μΆκ°νλ κ²μ λν λμμΈ μ² νμ λ
Όμν©λλ€.
* XLAμ TensorFlow κ·Έλνμ λν΄ λ μμΈν μκ³ μΆμ κ²½μ° μΆμ²νλ κΈ:
* [XLA: κΈ°κ³ νμ΅μ μν μ΅μ ν μ»΄νμΌλ¬](https://www.tensorflow.org/xla)
* [κ·Έλν λ° tf.function μκ°](https://www.tensorflow.org/guide/intro_to_graphs)
* [tf.functionμΌλ‘ μ±λ₯ ν₯μνκΈ°](https://www.tensorflow.org/guide/function) |