NetherlandsForensicInstitute
/

ARM64BERT

@@ -12,10 +12,9 @@ TODO: add link to github repo once known
 ## General
 ### What is the purpose of the model
-The model is a semantic search BERT model of ARM64 assembly code that can be used to find similar ARM64 functions to a
-given ARM4 function. This specific model has NOT been specifically finetuned for semantic similarity, you most likely want
 to use our [other
-model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert-embedding). The main purpose of this model is to be a baseline
 to compare the finetuned model against.
 ### What does the model architecture look like?
@@ -24,14 +23,15 @@ The model architecture is inspired by [jTrans](https://github.com/vul337/jTrans)
 although the typical Next Sentence Prediction has been replaced with Jump Target Prediction, as proposed in Wang et al.
 ### What is the output of the model?
-The model returns a vector of 768 dimensions for each function. These vectors can be compared to
-get an indication of which functions are similar to each other.
 ### How does the model perform?
-The model has been evaluated on [Mean Reciprocal Rank (MRR)](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) and
 [Recall@1](https://en.wikipedia.org/wiki/Precision_and_recall).
 When the model has to pick the positive example out of a pool of 32, it almost always ranks it first. When
-the pool is significantly enlarged to 10.000 functions, it still ranks the positive example highest most of the time.
 | Model   | Pool size | MRR  | Recall@1 |
@@ -47,7 +47,7 @@ the pool is significantly enlarged to 10.000 functions, it still ranks the posit
 The model has been designed to act as a basemodel for the ARM64 language.
 ### What else could the model be used for?
-The model can also be used to find similar ARM64 functions in a database of known ARM64 functions.
 ### To what problems is the model not applicable?
 Although the model performs reasonably well on the semantic search task, this model has NOT been finetuned on that task.
@@ -100,4 +100,4 @@ n.a.
 n.a.
 ## Analyses (optional)
-n.a.

 ## General
 ### What is the purpose of the model
+The model is a BERT model for ARM64 assembly code. This specific model has NOT been specifically finetuned for semantic similarity, you most likely want
 to use our [other
+model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert-embedding). The main purpose of the ARM64BERT is to be a baseline
 to compare the finetuned model against.
 ### What does the model architecture look like?
 although the typical Next Sentence Prediction has been replaced with Jump Target Prediction, as proposed in Wang et al.
 ### What is the output of the model?
+The model is a BERT base model, of which the outputs are not meant to be used directly.
 ### How does the model perform?
+We have compared this model against the model specifically finetuned for semantic similarity, in order to do this we initalised this base model
+as a SentenceTransfomer moden.
+The model was then evaluated on [Mean Reciprocal Rank (MRR)](https://en.wikipedia.org/wiki/Mean_reciprocal_rank) and
 [Recall@1](https://en.wikipedia.org/wiki/Precision_and_recall).
 When the model has to pick the positive example out of a pool of 32, it almost always ranks it first. When
+the pool is significantly enlarged to 10.000 functions, it still ranks the positive example highest most of the time.
 | Model   | Pool size | MRR  | Recall@1 |
 The model has been designed to act as a basemodel for the ARM64 language.
 ### What else could the model be used for?
+The model can also be used to find similar ARM64 functions in a database of known ARM64 functions when initialised as a SentenceTransformer model.
 ### To what problems is the model not applicable?
 Although the model performs reasonably well on the semantic search task, this model has NOT been finetuned on that task.
 n.a.
 ## Analyses (optional)
+n.a.