Spaces:
Runtime error
Runtime error
<!--Copyright 2023 The HuggingFace Team. All rights reserved. | |
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with | |
the License. You may obtain a copy of the License at | |
http://www.apache.org/licenses/LICENSE-2.0 | |
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on | |
an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the | |
specific language governing permissions and limitations under the License. | |
β οΈ Note that this file is in Markdown but contain specific syntax for our doc-builder (similar to MDX) that may not be | |
rendered properly in your Markdown viewer. | |
--> | |
# TensorFlow λͺ¨λΈμ μν XLA ν΅ν© [[xla-integration-for-tensorflow-models]] | |
[[open-in-colab]] | |
XLA(Accelerated Linear Algebra)λ TensorFlow λͺ¨λΈμ μ€ν μκ°μ κ°μννκΈ° μν μ»΄νμΌλ¬μ λλ€. [곡μ λ¬Έμ](https://www.tensorflow.org/xla)μ λ°λ₯΄λ©΄ λ€μκ³Ό κ°μ΅λλ€: | |
XLA(Accelerated Linear Algebra)λ μ ν λμλ₯Ό μν λλ©μΈ νΉν μ»΄νμΌλ¬λ‘, TensorFlow λͺ¨λΈμ μμ€ μ½λ λ³κ²½ μμ΄ κ°μνν μ μμ΅λλ€. | |
TensorFlowμμ XLAλ₯Ό μ¬μ©νλ κ²μ κ°λ¨ν©λλ€. XLAλ `tensorflow` λΌμ΄λΈλ¬λ¦¬ λ΄μ ν¨ν€μ§λ‘ μ 곡λλ©°, [`tf.function`](https://www.tensorflow.org/guide/intro_to_graphs)κ³Ό κ°μ κ·Έλν μμ± ν¨μμμ `jit_compile` μΈμλ₯Ό μ¬μ©νμ¬ νμ±νν μ μμ΅λλ€. `fit()` λ° `predict()`μ κ°μ Keras λ©μλλ₯Ό μ¬μ©νλ κ²½μ°, `jit_compile` μΈμλ₯Ό `model.compile()`μ μ λ¬νμ¬ XLAλ₯Ό κ°λ¨νκ² νμ±νν μ μμ΅λλ€. κ·Έλ¬λ XLAλ μ΄λ¬ν λ©μλμ κ΅νλμ§ μκ³ μμμ `tf.function`μ κ°μννλ λ°μλ μ¬μ©ν μ μμ΅λλ€. | |
π€ Transformersμμλ [GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2), [T5](https://huggingface.co/docs/transformers/model_doc/t5), [OPT](https://huggingface.co/docs/transformers/model_doc/opt)μ κ°μ λͺ¨λΈμ ν μ€νΈ μμ±, κ·Έλ¦¬κ³ [Whisper](https://huggingface.co/docs/transformers/model_doc/whisper)μ κ°μ λͺ¨λΈμ μμ± μ²λ¦¬λ₯Ό ν¬ν¨νμ¬ μ¬λ¬ TensorFlow λ©μλκ° XLAμ νΈνλλλ‘ λ€μ μμ±λμμ΅λλ€. | |
μ νν μλ ν₯μμ λͺ¨λΈμ λ°λΌ λ€λ₯΄μ§λ§, π€ Transformers λ΄μ TensorFlow ν μ€νΈ μμ± λͺ¨λΈμ κ²½μ° μ΅λ 100λ°°μ μλ ν₯μμ νμΈνμ΅λλ€. μ΄ λ¬Έμμμλ μ΄λ¬ν λͺ¨λΈμ λν΄ XLAλ₯Ό μ¬μ©νμ¬ μ΅λ μ±λ₯μ μ»λ λ°©λ²μ μ€λͺ ν©λλ€. λν XLA ν΅ν©μ λ²€μΉλ§ν¬ λ° λμμΈ μ² νμ λν μΆκ° μλ£ λ§ν¬λ μ 곡ν κ²μ λλ€. | |
## XLAλ₯Ό μ¬μ©νμ¬ TF ν¨μ μ€ννκΈ° [[running-tf-functions-with-xla]] | |
TensorFlowμμ λ€μκ³Ό κ°μ λͺ¨λΈμ κ³ λ €ν΄ λ΄ μλ€: | |
```py | |
import tensorflow as tf | |
model = tf.keras.Sequential( | |
[tf.keras.layers.Dense(10, input_shape=(10,), activation="relu"), tf.keras.layers.Dense(5, activation="softmax")] | |
) | |
``` | |
μ λͺ¨λΈμ μ°¨μμ΄ `(10, )`μΈ μ λ ₯μ λ°μ΅λλ€. λ€μκ³Ό κ°μ΄ λͺ¨λΈμ μ¬μ©νμ¬ μμ νλ₯Ό μ€νν μ μμ΅λλ€: | |
```py | |
# λͺ¨λΈμ λν μμμ μ λ ₯μ μμ±ν©λλ€. | |
batch_size = 16 | |
input_vector_dim = 10 | |
random_inputs = tf.random.normal((batch_size, input_vector_dim)) | |
# μμ νλ₯Ό μ€νν©λλ€. | |
_ = model(random_inputs) | |
``` | |
XLAλ‘ μ»΄νμΌλ ν¨μλ‘ μμ νλ₯Ό μ€ννλ €λ©΄ λ€μκ³Ό κ°μ΄ ν΄μΌ ν©λλ€: | |
```py | |
xla_fn = tf.function(model, jit_compile=True) | |
_ = xla_fn(random_inputs) | |
``` | |
`model`μ κΈ°λ³Έ `call()` ν¨μλ XLA κ·Έλνλ₯Ό μ»΄νμΌνλ λ° μ¬μ©λ©λλ€. κ·Έλ¬λ λ€λ₯Έ λͺ¨λΈ ν¨μλ₯Ό XLAλ‘ μ»΄νμΌνλ €λ©΄ λ€μκ³Ό κ°μ΄ ν μλ μμ΅λλ€: | |
```py | |
my_xla_fn = tf.function(model.my_xla_fn, jit_compile=True) | |
``` | |
## π€ Transformersμμ XLAλ₯Ό μ¬μ©νμ¬ TF ν μ€νΈ μμ± λͺ¨λΈ μ€ννκΈ° [[running-a-tf-text-generation-model-with-xla-from-transformers]] | |
π€ Transformersμμ XLAλ‘ κ°μνλ μμ±μ νμ±ννλ €λ©΄ μ΅μ λ²μ μ `transformers`κ° μ€μΉλμ΄ μμ΄μΌ ν©λλ€. λ€μκ³Ό κ°μ΄ μ€μΉν μ μμ΅λλ€: | |
```bash | |
pip install transformers --upgrade | |
``` | |
κ·Έλ¦¬κ³ λ€μ μ½λλ₯Ό μ€νν μ μμ΅λλ€: | |
```py | |
import tensorflow as tf | |
from transformers import AutoTokenizer, TFAutoModelForCausalLM | |
# μ΅μ λ²μ μ Transformersκ° μ€μΉλμ΄ μμ§ μλ€λ©΄ μ€λ₯κ° λ°μν©λλ€. | |
from transformers.utils import check_min_version | |
check_min_version("4.21.0") | |
tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="</s>") | |
model = TFAutoModelForCausalLM.from_pretrained("gpt2") | |
input_string = ["TensorFlow is"] | |
# XLA μμ± ν¨μλ₯Ό λ§λ€κΈ° μν ν μ€ | |
xla_generate = tf.function(model.generate, jit_compile=True) | |
tokenized_input = tokenizer(input_string, return_tensors="tf") | |
generated_tokens = xla_generate(**tokenized_input, num_beams=2) | |
decoded_text = tokenizer.decode(generated_tokens[0], skip_special_tokens=True) | |
print(f"Generated -- {decoded_text}") | |
# Generated -- TensorFlow is an open-source, open-source, distributed-source application # framework for the | |
``` | |
μ μ μλ―μ΄, `generate()`μμ XLAλ₯Ό νμ±ννλ κ²μ λ¨ ν μ€μ μ½λμ λλ€. μ½λμ λλ¨Έμ§ λΆλΆμ λ³κ²½λμ§ μμ΅λλ€. κ·Έλ¬λ μ μ½λ μ€λν«μμλ XLAμ νΉμ ν λͺ κ°μ§ μ£Όμν μ μ΄ μμ΅λλ€. XLAκ° κ°μ Έλ€μ€ μλ ν₯μμ μ€ννκΈ° μν΄μλ μ΄λ₯Ό μκ³ μμ΄μΌ ν©λλ€. λ€μ μΉμ μμ μ΄μ λν΄ λ Όμν©λλ€. | |
## μ£Όμν μ [[gotchas-to-be-aware-of]] | |
XLA νμ±ν ν¨μ(`xla_generate()`μ κ°μ)λ₯Ό μ²μ μ€νν λ λ΄λΆμ μΌλ‘ κ³μ° κ·Έλνλ₯Ό μΆλ‘ νλ €κ³ νλ©°, μ΄λ μκ°μ΄ μμλ©λλ€. μ΄ κ³Όμ μ [βμΆμ (tracing)β](https://www.tensorflow.org/guide/intro_to_graphs#when_is_a_function_tracing)μ΄λΌκ³ μλ €μ Έ μμ΅λλ€. | |
μμ± μκ°μ΄ λΉ λ₯΄μ§ μλ€λ κ²μ μ μ μμ κ²μ λλ€. `xla_generate()`(λλ λ€λ₯Έ XLA νμ±ν ν¨μ)μ μ°μ νΈμΆμ ν¨μμ μ λ¬λ μ λ ₯μ΄ μ΄κΈ°μ ꡬμΆλ κ³μ° κ·Έλνμ λμΌν ννλ₯Ό λ°λ₯Έλ€λ©΄, κ³μ° κ·Έλνλ₯Ό μΆλ‘ ν νμκ° μμ΅λλ€. μ΄λ μ λ ₯ ννκ° κ³ μ λ λͺ¨λ¬λ¦¬ν°(μ: μ΄λ―Έμ§)μλ λ¬Έμ κ° λμ§ μμ§λ§, κ°λ³ μ λ ₯ νν λͺ¨λ¬λ¦¬ν°(μ: ν μ€νΈ)λ₯Ό μ¬μ©ν λ μ£Όμν΄μΌ ν©λλ€. | |
`xla_generate()`κ° νμ λμΌν μ λ ₯ ννλ‘ λμνλλ‘ νλ €λ©΄, ν ν¬λμ΄μ λ₯Ό νΈμΆν λ `padding` μΈμλ₯Ό μ§μ ν μ μμ΅λλ€. | |
```py | |
import tensorflow as tf | |
from transformers import AutoTokenizer, TFAutoModelForCausalLM | |
tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="</s>") | |
model = TFAutoModelForCausalLM.from_pretrained("gpt2") | |
input_string = ["TensorFlow is"] | |
xla_generate = tf.function(model.generate, jit_compile=True) | |
# μ¬κΈ°μ, padding μ΅μ μ΄ μλ ν ν¬λμ΄μ λ₯Ό νΈμΆν©λλ€. | |
tokenized_input = tokenizer(input_string, pad_to_multiple_of=8, padding=True, return_tensors="tf") | |
generated_tokens = xla_generate(**tokenized_input, num_beams=2) | |
decoded_text = tokenizer.decode(generated_tokens[0], skip_special_tokens=True) | |
print(f"Generated -- {decoded_text}") | |
``` | |
μ΄λ κ² νλ©΄ `xla_generate()`μ λν μ λ ₯μ΄ νμ μΆμ λ ννλ‘ μ λ¬λμ΄ μμ± μκ°μ΄ κ°μνλ©λλ€. λ€μ μ½λλ‘ μ΄λ₯Ό νμΈν μ μμ΅λλ€: | |
```py | |
import time | |
import tensorflow as tf | |
from transformers import AutoTokenizer, TFAutoModelForCausalLM | |
tokenizer = AutoTokenizer.from_pretrained("gpt2", padding_side="left", pad_token="</s>") | |
model = TFAutoModelForCausalLM.from_pretrained("gpt2") | |
xla_generate = tf.function(model.generate, jit_compile=True) | |
for input_string in ["TensorFlow is", "TensorFlow is a", "TFLite is a"]: | |
tokenized_input = tokenizer(input_string, pad_to_multiple_of=8, padding=True, return_tensors="tf") | |
start = time.time_ns() | |
generated_tokens = xla_generate(**tokenized_input, num_beams=2) | |
end = time.time_ns() | |
print(f"Execution time -- {(end - start) / 1e6:.1f} ms\n") | |
``` | |
Tesla T4 GPUμμλ λ€μκ³Ό κ°μ μΆλ ₯μ μμν μ μμ΅λλ€: | |
```bash | |
Execution time -- 30819.6 ms | |
Execution time -- 79.0 ms | |
Execution time -- 78.9 ms | |
``` | |
`xla_generate()`μ 첫 λ²μ§Έ νΈμΆμ μΆμ λλ¬Έμ μκ°μ΄ μ€λ 걸리μ§λ§, μ°μ νΈμΆμ λͺ λ°°λ λΉ λ¦ λλ€. μμ± μ΅μ μ λν μ΄λ€ λ³κ²½μ΄λ λ€μ μΆμ μ μ λ°νλ―λ‘ μμ± μκ°μ΄ λλ €μ§ μ μμμ λͺ μ¬νμΈμ. | |
μ΄ λ¬Έμμμλ π€ Transformersμμ μ 곡νλ λͺ¨λ ν μ€νΈ μμ± μ΅μ μ λ€λ£¨μ§ μμμ΅λλ€. κ³ κΈ μ¬μ© μ¬λ‘μ λν΄ λ¬Έμλ₯Ό μ°Έμ‘°νμκΈ° λ°λλλ€. | |
## μΆκ° μλ£ [[additional-resources]] | |
μ¬κΈ°μ π€ Transformersμ XLAμ λν΄ λ μμΈν μκ³ μΆμ κ²½μ° λμμ΄ λ μ μλ λͺ κ°μ§ μΆκ° μλ£λ₯Ό μ 곡ν©λλ€. | |
* [μ΄ Colab λ ΈνΈλΆ](https://colab.research.google.com/github/huggingface/blog/blob/main/notebooks/91_tf_xla_generate.ipynb)μ XLAμ νΈνλλ μΈμ½λ-λμ½λ([T5](https://huggingface.co/docs/transformers/model_doc/t5)μ κ°μ) λ° λμ½λ μ μ©([GPT2](https://huggingface.co/docs/transformers/model_doc/gpt2)μ κ°μ) ν μ€νΈ μμ± λͺ¨λΈμ μ€νν΄ λ³Ό μ μλ λνν λ°λͺ¨λ₯Ό μ 곡ν©λλ€. | |
* [μ΄ λΈλ‘κ·Έ κΈ](https://huggingface.co/blog/tf-xla-generate)μ TensorFlowμμ XLAμ λν μΉμ ν μκ°μ ν¨κ» XLAμ νΈνλλ λͺ¨λΈμ λΉκ΅ λ²€μΉλ§ν¬μ λν κ°μλ₯Ό μ 곡ν©λλ€. | |
* [μ΄ λΈλ‘κ·Έ κΈ](https://blog.tensorflow.org/2022/11/how-hugging-face-improved-text-generation-performance-with-xla.html)μ π€ Transformersμ TensorFlow λͺ¨λΈμ XLA μ§μμ μΆκ°νλ κ²μ λν λμμΈ μ² νμ λ Όμν©λλ€. | |
* XLAμ TensorFlow κ·Έλνμ λν΄ λ μμΈν μκ³ μΆμ κ²½μ° μΆμ²νλ κΈ: | |
* [XLA: κΈ°κ³ νμ΅μ μν μ΅μ ν μ»΄νμΌλ¬](https://www.tensorflow.org/xla) | |
* [κ·Έλν λ° tf.function μκ°](https://www.tensorflow.org/guide/intro_to_graphs) | |
* [tf.functionμΌλ‘ μ±λ₯ ν₯μνκΈ°](https://www.tensorflow.org/guide/function) |