Judithvdw commited on
Commit
f935770
·
verified ·
1 Parent(s): 5096bfb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -9
README.md CHANGED
@@ -12,10 +12,9 @@ TODO: add link to github repo once known
12
 
13
  ## General
14
  ### What is the purpose of the model
15
- The model is a semantic search BERT model of ARM64 assembly code that can be used to find similar ARM64 functions to a
16
- given ARM4 function. This specific model has NOT been specifically finetuned for semantic similarity, you most likely want
17
  to use our [other
18
- model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert-embedding). The main purpose of this model is to be a baseline
19
  to compare the finetuned model against.
20
 
21
  ### What does the model architecture look like?
@@ -24,14 +23,15 @@ The model architecture is inspired by [jTrans](https://github.com/vul337/jTrans)
24
  although the typical Next Sentence Prediction has been replaced with Jump Target Prediction, as proposed in Wang et al.
25
 
26
  ### What is the output of the model?
27
- The model returns a vector of 768 dimensions for each function. These vectors can be compared to
28
- get an indication of which functions are similar to each other.
29
 
30
  ### How does the model perform?
31
- The model has been evaluated on [Mean Reciprocal Rank (MRR)](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) and
 
 
32
  [Recall@1](https://en.wikipedia.org/wiki/Precision_and_recall).
33
  When the model has to pick the positive example out of a pool of 32, it almost always ranks it first. When
34
- the pool is significantly enlarged to 10.000 functions, it still ranks the positive example highest most of the time.
35
 
36
 
37
  | Model | Pool size | MRR | Recall@1 |
@@ -47,7 +47,7 @@ the pool is significantly enlarged to 10.000 functions, it still ranks the posit
47
  The model has been designed to act as a basemodel for the ARM64 language.
48
 
49
  ### What else could the model be used for?
50
- The model can also be used to find similar ARM64 functions in a database of known ARM64 functions.
51
 
52
  ### To what problems is the model not applicable?
53
  Although the model performs reasonably well on the semantic search task, this model has NOT been finetuned on that task.
@@ -100,4 +100,4 @@ n.a.
100
  n.a.
101
 
102
  ## Analyses (optional)
103
- n.a.
 
12
 
13
  ## General
14
  ### What is the purpose of the model
15
+ The model is a BERT model for ARM64 assembly code. This specific model has NOT been specifically finetuned for semantic similarity, you most likely want
 
16
  to use our [other
17
+ model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert-embedding). The main purpose of the ARM64BERT is to be a baseline
18
  to compare the finetuned model against.
19
 
20
  ### What does the model architecture look like?
 
23
  although the typical Next Sentence Prediction has been replaced with Jump Target Prediction, as proposed in Wang et al.
24
 
25
  ### What is the output of the model?
26
+ The model is a BERT base model, of which the outputs are not meant to be used directly.
 
27
 
28
  ### How does the model perform?
29
+ We have compared this model against the model specifically finetuned for semantic similarity, in order to do this we initalised this base model
30
+ as a SentenceTransfomer moden.
31
+ The model was then evaluated on [Mean Reciprocal Rank (MRR)](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) and
32
  [Recall@1](https://en.wikipedia.org/wiki/Precision_and_recall).
33
  When the model has to pick the positive example out of a pool of 32, it almost always ranks it first. When
34
+ the pool is significantly enlarged to 10.000 functions, it still ranks the positive example highest most of the time.
35
 
36
 
37
  | Model | Pool size | MRR | Recall@1 |
 
47
  The model has been designed to act as a basemodel for the ARM64 language.
48
 
49
  ### What else could the model be used for?
50
+ The model can also be used to find similar ARM64 functions in a database of known ARM64 functions when initialised as a SentenceTransformer model.
51
 
52
  ### To what problems is the model not applicable?
53
  Although the model performs reasonably well on the semantic search task, this model has NOT been finetuned on that task.
 
100
  n.a.
101
 
102
  ## Analyses (optional)
103
+ n.a.