|
--- |
|
language: uz |
|
license: apache-2.0 |
|
tags: |
|
- uzbek |
|
- dependency-parsing |
|
- universal-dependencies |
|
- nlp |
|
datasets: |
|
- universal_dependencies |
|
metrics: |
|
- accuracy |
|
- f1 |
|
--- |
|
|
|
# Uzbek Dependency Parser |
|
|
|
This model predicts Universal Dependencies dependency relations for Uzbek text. |
|
|
|
## Model details |
|
|
|
The model was fine-tuned on a Universal Dependencies treebank containing approximately 600 annotated sentences. |
|
It is based on the [XLM-RoBERTa base model](https://huggingface.co/xlm-roberta-base) and adapted for token classification. |
|
|
|
## Usage |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForTokenClassification |
|
import torch |
|
|
|
# Load model and tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained("Arofat/uzbek-dependency-parser") |
|
model = AutoModelForTokenClassification.from_pretrained("Arofat/uzbek-dependency-parser") |
|
|
|
# Prepare text |
|
text = "Men O'zbekistonda yashayman." |
|
tokens = text.split() |
|
|
|
# Get predictions |
|
inputs = tokenizer(tokens, is_split_into_words=True, return_tensors="pt") |
|
with torch.no_grad(): |
|
outputs = model(**inputs) |
|
|
|
# Process outputs |
|
predictions = torch.argmax(outputs.logits, dim=2) |
|
id2label = model.config.id2label |
|
|
|
# Get dependency relations |
|
dep_tags = [] |
|
word_ids = inputs.word_ids(batch_index=0) |
|
prev_word_id = None |
|
for idx, word_id in enumerate(word_ids): |
|
if word_id is None or word_id == prev_word_id: |
|
continue |
|
dep_tags.append(id2label[predictions[0, idx].item()]) |
|
prev_word_id = word_id |
|
|
|
# Print results |
|
for token, tag in zip(tokens, dep_tags): |
|
print(f"{token}: {tag}") |
|
``` |
|
|
|
## Limitations |
|
|
|
This model was trained on a relatively small dataset and may not generalize well to all domains of Uzbek text. |
|
Note that this model only predicts dependency relations (labels) and not the dependency tree structure (heads). |
|
For a complete dependency parse, additional processing is needed. |
|
|