File size: 5,012 Bytes
e9ee4f1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
---
license: apache-2.0
license_link: https://huggingface.co/skt/A.X-3.1/blob/main/LICENSE
language:
- en
- ko
pipeline_tag: text-classification
library_name: transformers
model_id: skt/A.X-Encoder-base
developers: SKT AI Model Lab
model-index:
- name: A.X-Encoder-base
  results:
  - task:
      type: text-classification
      name: kobest
    metrics:
    - type: KoBEST
      value: 85.50
  - task:
      type: text-classification
      name: klue
    metrics:
    - type: KLUE
      value: 86.10
---

# A.X Encoder

<div align="center">
  <img src="./assets/A.X_from_scratch_logo_ko_4x3.png" alt="A.X Logo" width="300"/>
</div>

## A.X Encoder Highlights

**A.X Encoder** (pronounced "A dot X") is SKT's document understanding model optimized for Korean-language understanding and enterprise deployment.
This lightweight encoder was developed entirely in-house by SKT, encompassing model architecture, data curation, and training, all carried out on SKT’s proprietary supercomputing infrastructure, TITAN. 
This model utilizes the ModernBERT architecture, which supports flash attention and long-context processing.

- **Longer Context**: A.X Encoder supports long-context processing of up to **16,384** tokens.
- **Faster Inference**: A.X Encoder achieves up to 3x faster inference speed than earlier models.
- **Superior Korean Language Understanding**: A.X Encoder achieves superior performance on diverse Korean NLU tasks.


## Core Technologies

A.X Encoder represents **an efficient long document understanding model** for processing a large-scale corpus, developed end-to-end by SKT.

This model plays a key role in **data curation for A.X LLM** by serving as a versatile document classifier, identifying features such as educational value, domain category, and difficulty level.

## Benchmark Results

### Model Inference Speed (Run on an A100 GPU)
<div align="center">
  <img src="./assets/speed.png" alt="inference" width="500"/>
</div>

### Model Performance
<div align="center">
  <img src="./assets/performance.png" alt="performance" width="500"/>
</div>

| Method                        | BoolQ (f1) | COPA (f1) | Sentineg (f1) | WiC (f1) | **Avg. (KoBEST)** |
| ----------------------------- | ---------- | --------- | ------------- | -------- | ----------------- |
| **klue/roberta-base**         | 72.04      | 65.14     | 90.39         | 78.19    | 76.44             |
| **kakaobank/kf-deberta-base** | 81.30      | 76.50     | 94.70         | 80.50    | 83.25             |
| **skt/A.X-Encoder-base**      | 84.50      | 78.70     | 96.00         | 80.80    | **85.50**             |


| Method                        | NLI (acc) | STS (f1) | YNAT (acc) | **Avg. (KLUE)** |
| ----------------------------- | --------- | -------- | ---------- | --------------- |
| **klue/roberta-base**         | 84.53     | 84.57    | 86.48      | 85.19           |
| **kakaobank/kf-deberta-base** | 86.10     | 84.30    | 87.00      | 85.80           |
| **skt/A.X-Encoder-base**      | 87.00     | 84.80    | 86.50      | **86.10**           |


## πŸš€ Quickstart

### with HuggingFace Transformers

- `transformers>=4.51.0` or the latest version is required to use `skt/A.X-Encoder-base`
```bash
pip install transformers>=4.51.0
```

⚠️ If your GPU supports it, we recommend using A.X Encoder with Flash Attention 2 to reach the highest efficiency. To do so, install Flash Attention as follows, then use the model as normal:

```bash
pip install flash-attn --no-build-isolation
```
#### Example Usage

Using AutoModelForMaskedLM:

```python
import torch
from transformers import AutoTokenizer, AutoModelForMaskedLM

model_id = "skt/A.X-Encoder-base"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForMaskedLM.from_pretrained(model_id, attn_implementation="flash_attention_2", torch_dtype=torch.bfloat16)

text = "ν•œκ΅­μ˜ μˆ˜λ„λŠ” <mask>."
inputs = tokenizer(text, return_tensors="pt")
outputs = model(**inputs)

# To get predictions for the mask:
masked_index = inputs["input_ids"][0].tolist().index(tokenizer.mask_token_id)
predicted_token_id = outputs.logits[0, masked_index].argmax(axis=-1)
predicted_token = tokenizer.decode(predicted_token_id)
print("Predicted token:", predicted_token)
# Predicted token: μ„œμšΈ
```

Using a pipeline:

```python
import torch
from transformers import pipeline
from pprint import pprint

pipe = pipeline(
    "fill-mask",
    model="skt/A.X-Encoder-base",
    torch_dtype=torch.bfloat16,
)

input_text = "ν•œκ΅­μ˜ μˆ˜λ„λŠ” <mask>."
results = pipe(input_text)
pprint(results)
# [{'score': 0.07568359375,
#  'sequence': 'ν•œκ΅­μ˜ μˆ˜λ„λŠ” μ„œμšΈ.',
#  'token': 31430,
#  'token_str': 'μ„œμšΈ'}, ...
```

## License

The `A.X Encoder` model is licensed under `Apache License 2.0`.

## Citation
```
@article{SKTAdotXEncoder-base,
  title={A.X Encoder-base},
  author={SKT AI Model Lab},
  year={2025},
  url={https://huggingface.co/skt/A.X-Encoder-base}
}
```

## Contact

- Business & Partnership Contact: [[email protected]]([email protected])