Fix title
Browse files
README.md
CHANGED
@@ -3,24 +3,23 @@ license: eupl-1.1
|
|
3 |
language: code
|
4 |
---
|
5 |
|
6 |
-
|
7 |
-
|
8 |
|
9 |
[GitHub repository](https://github.com/NetherlandsForensicInstitute/asmtransformers)
|
10 |
|
11 |
## General
|
12 |
### What is the purpose of the model
|
13 |
The model is a BERT model for ARM64 assembly code. This specific model has NOT been specifically finetuned for semantic similarity, you most likely want
|
14 |
-
to use our [other
|
15 |
-
model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert-embedding). The main purpose of the ARM64BERT is to be a baseline
|
16 |
to compare the finetuned model against.
|
17 |
|
18 |
-
### What does the model architecture look like?
|
19 |
The model architecture is inspired by [jTrans](https://github.com/vul337/jTrans) (Wang et al., 2022). It is a BERT model
|
20 |
(Devlin et al. 2019),
|
21 |
although the typical Next Sentence Prediction has been replaced with Jump Target Prediction, as proposed in Wang et al.
|
22 |
|
23 |
-
### What is the output of the model?
|
24 |
The model is a BERT base model, of which the outputs are not meant to be used directly.
|
25 |
|
26 |
### How does the model perform?
|
@@ -67,8 +66,7 @@ either the train or the test set, not both. We have not performed any deduplicat
|
|
67 |
The dataset was collected by our team. The annotation of similar/non-similar function comes from the different compilation
|
68 |
levels, i.e. what we consider "similar functions" is in fact the same function that has been compiled in a different way.
|
69 |
|
70 |
-
|
71 |
-
### Any remarks on data quality and bias?
|
72 |
The way we classify functions as similar may have implications. For example, sometimes, two different ways of compiling
|
73 |
the same function does not result in a different piece of code. We did not remove duplicates from the data during training,
|
74 |
but we did implement checks in the evaluation stage and it seems that the model has not suffered from the simple training
|
|
|
3 |
language: code
|
4 |
---
|
5 |
|
6 |
+
ARM64BERT 🦾
|
7 |
+
------------
|
8 |
|
9 |
[GitHub repository](https://github.com/NetherlandsForensicInstitute/asmtransformers)
|
10 |
|
11 |
## General
|
12 |
### What is the purpose of the model
|
13 |
The model is a BERT model for ARM64 assembly code. This specific model has NOT been specifically finetuned for semantic similarity, you most likely want
|
14 |
+
to use our [other model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert-embedding). The main purpose of the ARM64BERT is to be a baseline
|
|
|
15 |
to compare the finetuned model against.
|
16 |
|
17 |
+
### What does the model architecture look like?
|
18 |
The model architecture is inspired by [jTrans](https://github.com/vul337/jTrans) (Wang et al., 2022). It is a BERT model
|
19 |
(Devlin et al. 2019),
|
20 |
although the typical Next Sentence Prediction has been replaced with Jump Target Prediction, as proposed in Wang et al.
|
21 |
|
22 |
+
### What is the output of the model?
|
23 |
The model is a BERT base model, of which the outputs are not meant to be used directly.
|
24 |
|
25 |
### How does the model perform?
|
|
|
66 |
The dataset was collected by our team. The annotation of similar/non-similar function comes from the different compilation
|
67 |
levels, i.e. what we consider "similar functions" is in fact the same function that has been compiled in a different way.
|
68 |
|
69 |
+
### Any remarks on data quality and bias?
|
|
|
70 |
The way we classify functions as similar may have implications. For example, sometimes, two different ways of compiling
|
71 |
the same function does not result in a different piece of code. We did not remove duplicates from the data during training,
|
72 |
but we did implement checks in the evaluation stage and it seems that the model has not suffered from the simple training
|