A newer version of the Gradio SDK is available:
5.23.3
Changelog
π’ Release v1.0.3
- π¨ The
IndicProcessor
class has been re-written in Cython for faster implementation. This gives us atleast+10 lines/s
. - A new
visualize
argument as been added topreprocess_batch
to track the processing with atqdm
bar.
π’ Release v1.0.2
- The repository has been renamed to
IndicTransToolkit
. - π¨ The custom tokenizer is now removed from the repository. Please revert to a previous commit (v1.0.1) to use it (strongly discouraged). The official (and only tokenizer) is available on HF along with the models.
π’ Release v1.0.0
- The PreTrainedTokenizer for IndicTrans2 is now available on HF ππ Note that, you still need the
IndicProcessor
to pre-process the sentences before tokenization. - π¨ In favor of the standard PreTrainedTokenizer, we deprecated the custom tokenizer. However, this custom tokenizer will still be available here for backward compatibility, but no further updates/bug-fixes will be provided.
- The
indic_evaluate
function is now consolidated into a concreteIndicEvaluator
class. - The data collation function for training is consolidated into a concrete
IndicDataCollator
class. - A simple batching method is now available in the
IndicProcessor
.