manu
/

lilt-camembert-base

Token Classification

liltrobertalike

Model card Files Files and versions

manu commited on Mar 30, 2022

Commit

2c61db3

·

1 Parent(s): b8cf323

Create README.md

Files changed (1) hide show

README.md +45 -0

README.md ADDED Viewed

	@@ -0,0 +1,45 @@

+---
+language:
+- fr
+tags:
+- token-classification
+- fill-mask
+license: mit
+datasets:
+- iit-cdip
+---
+This model is the combined camembert-base model, with the pretrained lilt checkpoint from the paper "LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding".
+ Original repository: https://github.com/jpWang/LiLT
+To use it, it is necessary to fork the modeling and configuration files from the original repository, and load the pretrained model from the corresponding classes (LiLTRobertaLikeConfig, LiLTRobertaLikeForRelationExtraction, LiLTRobertaLikeForTokenClassification, LiLTRobertaLikeModel).
+They can also be preloaded with the AutoConfig/model factories as such:
+```python
+from transformers import AutoModelForTokenClassification, AutoConfig
+from path_to_custom_classes import (
+    LiLTRobertaLikeConfig,
+    LiLTRobertaLikeForRelationExtraction,
+    LiLTRobertaLikeForTokenClassification,
+    LiLTRobertaLikeModel
+    )
+def patch_transformers():
+    AutoConfig.register("liltrobertalike", LiLTRobertaLikeConfig)
+    AutoModel.register(LiLTRobertaLikeConfig, LiLTRobertaLikeModel)
+    AutoModelForTokenClassification.register(LiLTRobertaLikeConfig, LiLTRobertaLikeForTokenClassification)
+    # etc...
+ ```
+ To load the model, it is then possible to use:
+ ```python
+ # patch_transformers() must have been executed beforehand
+tokenizer = AutoTokenizer.from_pretrained(self.tokenizer_name, use_auth_token=self.use_auth_token)
+model = AutoModel.from_pretrained("manu/lilt-camembert-base")
+model = AutoModelForTokenClassification.from_pretrained("manu/lilt-camembert-base") # to be fine-tuned on a token classification task
+ ```