Copy textual updatse from embedding to base
Browse files
README.md
CHANGED
@@ -6,9 +6,7 @@ language: code
|
|
6 |
Model Card - ARM64BERT
|
7 |
----------
|
8 |
|
9 |
-
|
10 |
-
_Who to contact:_ fbda [at] nfi [dot] nl \
|
11 |
-
TODO: add link to github repo once known
|
12 |
|
13 |
## General
|
14 |
### What is the purpose of the model
|
@@ -33,14 +31,11 @@ The model was then evaluated on [Mean Reciprocal Rank (MRR)](https://en.wikipedi
|
|
33 |
When the model has to pick the positive example out of a pool of 32, it almost always ranks it first. When
|
34 |
the pool is significantly enlarged to 10.000 functions, it still ranks the positive example highest most of the time.
|
35 |
|
36 |
-
|
37 |
| Model | Pool size | MRR | Recall@1 |
|
38 |
|---------|-----------|------|----------|
|
39 |
| ASMBert | 32 | 0.78 | 0.72 |
|
40 |
| ASMBert | 10.000 | 0.58 | 0.56 |
|
41 |
|
42 |
-
|
43 |
-
|
44 |
## Purpose and use of the model
|
45 |
|
46 |
### For which problem has the model been designed?
|
@@ -51,20 +46,17 @@ The model can also be used to find similar ARM64 functions in a database of know
|
|
51 |
|
52 |
### To what problems is the model not applicable?
|
53 |
Although the model performs reasonably well on the semantic search task, this model has NOT been finetuned on that task.
|
54 |
-
For a finetuned
|
55 |
-
model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert-embedding) we have published.
|
56 |
-
|
57 |
|
58 |
## Data
|
59 |
### What data was used for training and evaluation?
|
60 |
-
The dataset is created in the same way as Wang et al.
|
61 |
-
[ArchLinux official repositories](https://
|
62 |
-
All this code is split into functions that are compiled with different
|
63 |
-
(O0
|
64 |
-
in a maximum of 10 (5
|
65 |
-
The dataset is split into a train and a test set. This
|
66 |
-
either the train or the test set, not both. We have not performed any deduplication on the dataset for training.
|
67 |
-
|
68 |
|
69 |
| set | # functions |
|
70 |
|-------|------------:|
|
@@ -85,19 +77,3 @@ examples.
|
|
85 |
After training this base model, we found out that something had gone wrong when compiling our dataset. Consequently,
|
86 |
the last instruction of the previous function was included in the next. Due to the long training process, and the
|
87 |
good performance of the model despite the mistake, we have decided not to retrain our model.
|
88 |
-
|
89 |
-
|
90 |
-
|
91 |
-
## Fairness Metrics
|
92 |
-
|
93 |
-
### Which metrics have been used to measure bias in the data/model and why?
|
94 |
-
n.a.
|
95 |
-
|
96 |
-
### What do those metrics show?
|
97 |
-
n.a.
|
98 |
-
|
99 |
-
### Any other notable issues?
|
100 |
-
n.a.
|
101 |
-
|
102 |
-
## Analyses (optional)
|
103 |
-
n.a.
|
|
|
6 |
Model Card - ARM64BERT
|
7 |
----------
|
8 |
|
9 |
+
[GitHub repository](https://github.com/NetherlandsForensicInstitute/asmtransformers)
|
|
|
|
|
10 |
|
11 |
## General
|
12 |
### What is the purpose of the model
|
|
|
31 |
When the model has to pick the positive example out of a pool of 32, it almost always ranks it first. When
|
32 |
the pool is significantly enlarged to 10.000 functions, it still ranks the positive example highest most of the time.
|
33 |
|
|
|
34 |
| Model | Pool size | MRR | Recall@1 |
|
35 |
|---------|-----------|------|----------|
|
36 |
| ASMBert | 32 | 0.78 | 0.72 |
|
37 |
| ASMBert | 10.000 | 0.58 | 0.56 |
|
38 |
|
|
|
|
|
39 |
## Purpose and use of the model
|
40 |
|
41 |
### For which problem has the model been designed?
|
|
|
46 |
|
47 |
### To what problems is the model not applicable?
|
48 |
Although the model performs reasonably well on the semantic search task, this model has NOT been finetuned on that task.
|
49 |
+
For a finetuned ARM64BERT model, please refer to the [other model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert-embedding) published alongside this one.
|
|
|
|
|
50 |
|
51 |
## Data
|
52 |
### What data was used for training and evaluation?
|
53 |
+
The dataset is created in the same way as Wang et al. created Binary Corp.
|
54 |
+
A large set of binary code comes from the [ArchLinux official repositories](https://archlinux.org/packages/) and the [ArchLinux user repositories](https://aur.archlinux.org/packages/).
|
55 |
+
All this code is split into functions that are compiled with different optimalizations
|
56 |
+
(`O0`, `O1`, `O2`, `O3` and `Os`) and security settings (fortify or no-fortify).
|
57 |
+
This results in a maximum of 10 (5×2) different functions which are semantically similar, i.e. they represent the same functionality, but have different machine code.
|
58 |
+
The dataset is split into a train and a test set. This is done on project level, so all binaries and functions belonging to one project are part of
|
59 |
+
either the train or the test set, not both. We have not performed any deduplication on the dataset for training.
|
|
|
60 |
|
61 |
| set | # functions |
|
62 |
|-------|------------:|
|
|
|
77 |
After training this base model, we found out that something had gone wrong when compiling our dataset. Consequently,
|
78 |
the last instruction of the previous function was included in the next. Due to the long training process, and the
|
79 |
good performance of the model despite the mistake, we have decided not to retrain our model.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|