Update README.md
Browse files
README.md
CHANGED
@@ -12,10 +12,9 @@ TODO: add link to github repo once known
|
|
12 |
|
13 |
## General
|
14 |
### What is the purpose of the model
|
15 |
-
The model is a
|
16 |
-
given ARM4 function. This specific model has NOT been specifically finetuned for semantic similarity, you most likely want
|
17 |
to use our [other
|
18 |
-
model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert-embedding). The main purpose of
|
19 |
to compare the finetuned model against.
|
20 |
|
21 |
### What does the model architecture look like?
|
@@ -24,14 +23,15 @@ The model architecture is inspired by [jTrans](https://github.com/vul337/jTrans)
|
|
24 |
although the typical Next Sentence Prediction has been replaced with Jump Target Prediction, as proposed in Wang et al.
|
25 |
|
26 |
### What is the output of the model?
|
27 |
-
The model
|
28 |
-
get an indication of which functions are similar to each other.
|
29 |
|
30 |
### How does the model perform?
|
31 |
-
|
|
|
|
|
32 |
[Recall@1](https://en.wikipedia.org/wiki/Precision_and_recall).
|
33 |
When the model has to pick the positive example out of a pool of 32, it almost always ranks it first. When
|
34 |
-
the pool is significantly enlarged to 10.000 functions, it still ranks the positive example highest most of the time.
|
35 |
|
36 |
|
37 |
| Model | Pool size | MRR | Recall@1 |
|
@@ -47,7 +47,7 @@ the pool is significantly enlarged to 10.000 functions, it still ranks the posit
|
|
47 |
The model has been designed to act as a basemodel for the ARM64 language.
|
48 |
|
49 |
### What else could the model be used for?
|
50 |
-
The model can also be used to find similar ARM64 functions in a database of known ARM64 functions.
|
51 |
|
52 |
### To what problems is the model not applicable?
|
53 |
Although the model performs reasonably well on the semantic search task, this model has NOT been finetuned on that task.
|
@@ -100,4 +100,4 @@ n.a.
|
|
100 |
n.a.
|
101 |
|
102 |
## Analyses (optional)
|
103 |
-
n.a.
|
|
|
12 |
|
13 |
## General
|
14 |
### What is the purpose of the model
|
15 |
+
The model is a BERT model for ARM64 assembly code. This specific model has NOT been specifically finetuned for semantic similarity, you most likely want
|
|
|
16 |
to use our [other
|
17 |
+
model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert-embedding). The main purpose of the ARM64BERT is to be a baseline
|
18 |
to compare the finetuned model against.
|
19 |
|
20 |
### What does the model architecture look like?
|
|
|
23 |
although the typical Next Sentence Prediction has been replaced with Jump Target Prediction, as proposed in Wang et al.
|
24 |
|
25 |
### What is the output of the model?
|
26 |
+
The model is a BERT base model, of which the outputs are not meant to be used directly.
|
|
|
27 |
|
28 |
### How does the model perform?
|
29 |
+
We have compared this model against the model specifically finetuned for semantic similarity, in order to do this we initalised this base model
|
30 |
+
as a SentenceTransfomer moden.
|
31 |
+
The model was then evaluated on [Mean Reciprocal Rank (MRR)](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) and
|
32 |
[Recall@1](https://en.wikipedia.org/wiki/Precision_and_recall).
|
33 |
When the model has to pick the positive example out of a pool of 32, it almost always ranks it first. When
|
34 |
+
the pool is significantly enlarged to 10.000 functions, it still ranks the positive example highest most of the time.
|
35 |
|
36 |
|
37 |
| Model | Pool size | MRR | Recall@1 |
|
|
|
47 |
The model has been designed to act as a basemodel for the ARM64 language.
|
48 |
|
49 |
### What else could the model be used for?
|
50 |
+
The model can also be used to find similar ARM64 functions in a database of known ARM64 functions when initialised as a SentenceTransformer model.
|
51 |
|
52 |
### To what problems is the model not applicable?
|
53 |
Although the model performs reasonably well on the semantic search task, this model has NOT been finetuned on that task.
|
|
|
100 |
n.a.
|
101 |
|
102 |
## Analyses (optional)
|
103 |
+
n.a.
|