akaIDIOT commited on
Commit
91d79aa
·
verified ·
1 Parent(s): 5059ac7
Files changed (1) hide show
  1. README.md +6 -8
README.md CHANGED
@@ -3,24 +3,23 @@ license: eupl-1.1
3
  language: code
4
  ---
5
 
6
- Model Card - ARM64BERT
7
- ----------
8
 
9
  [GitHub repository](https://github.com/NetherlandsForensicInstitute/asmtransformers)
10
 
11
  ## General
12
  ### What is the purpose of the model
13
  The model is a BERT model for ARM64 assembly code. This specific model has NOT been specifically finetuned for semantic similarity, you most likely want
14
- to use our [other
15
- model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert-embedding). The main purpose of the ARM64BERT is to be a baseline
16
  to compare the finetuned model against.
17
 
18
- ### What does the model architecture look like?
19
  The model architecture is inspired by [jTrans](https://github.com/vul337/jTrans) (Wang et al., 2022). It is a BERT model
20
  (Devlin et al. 2019),
21
  although the typical Next Sentence Prediction has been replaced with Jump Target Prediction, as proposed in Wang et al.
22
 
23
- ### What is the output of the model?
24
  The model is a BERT base model, of which the outputs are not meant to be used directly.
25
 
26
  ### How does the model perform?
@@ -67,8 +66,7 @@ either the train or the test set, not both. We have not performed any deduplicat
67
  The dataset was collected by our team. The annotation of similar/non-similar function comes from the different compilation
68
  levels, i.e. what we consider "similar functions" is in fact the same function that has been compiled in a different way.
69
 
70
-
71
- ### Any remarks on data quality and bias?
72
  The way we classify functions as similar may have implications. For example, sometimes, two different ways of compiling
73
  the same function does not result in a different piece of code. We did not remove duplicates from the data during training,
74
  but we did implement checks in the evaluation stage and it seems that the model has not suffered from the simple training
 
3
  language: code
4
  ---
5
 
6
+ ARM64BERT 🦾
7
+ ------------
8
 
9
  [GitHub repository](https://github.com/NetherlandsForensicInstitute/asmtransformers)
10
 
11
  ## General
12
  ### What is the purpose of the model
13
  The model is a BERT model for ARM64 assembly code. This specific model has NOT been specifically finetuned for semantic similarity, you most likely want
14
+ to use our [other model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert-embedding). The main purpose of the ARM64BERT is to be a baseline
 
15
  to compare the finetuned model against.
16
 
17
+ ### What does the model architecture look like?
18
  The model architecture is inspired by [jTrans](https://github.com/vul337/jTrans) (Wang et al., 2022). It is a BERT model
19
  (Devlin et al. 2019),
20
  although the typical Next Sentence Prediction has been replaced with Jump Target Prediction, as proposed in Wang et al.
21
 
22
+ ### What is the output of the model?
23
  The model is a BERT base model, of which the outputs are not meant to be used directly.
24
 
25
  ### How does the model perform?
 
66
  The dataset was collected by our team. The annotation of similar/non-similar function comes from the different compilation
67
  levels, i.e. what we consider "similar functions" is in fact the same function that has been compiled in a different way.
68
 
69
+ ### Any remarks on data quality and bias?
 
70
  The way we classify functions as similar may have implications. For example, sometimes, two different ways of compiling
71
  the same function does not result in a different piece of code. We did not remove duplicates from the data during training,
72
  but we did implement checks in the evaluation stage and it seems that the model has not suffered from the simple training