Use first level header for "Model Card", remove extraneous whitespace (#2)
Browse files- Use first level header for "Model Card", remove extraneous whitespace (bf703f814681b289c4663fa0fff699b605c402e0)
README.md
CHANGED
@@ -6,7 +6,7 @@ base_model:
|
|
6 |
library_name: sentence-transformers
|
7 |
---
|
8 |
Model Card
|
9 |
-
|
10 |
|
11 |
_Who to contact:_ fbda [at] nfi [dot] nl \
|
12 |
TODO: add link to github repo
|
@@ -22,7 +22,6 @@ The model architecture is inspired by [jTrans](https://github.com/vul337/jTrans)
|
|
22 |
although the typical Next Sentence Prediction has been replaced with Jump Target Prediction, as proposed in Wang et al.
|
23 |
This architecture has subsequently been finetuned for semantic search purposes. We have followed the procedure proposed by [S-BERT](https://www.sbert.net/examples/applications/semantic-search/README.html).
|
24 |
|
25 |
-
|
26 |
### What is the output of the model?
|
27 |
The model returns a vector of 768 dimensions for each function that it's given. These vectors can be compared to
|
28 |
get an indication of which functions are similar to each other.
|
@@ -33,13 +32,11 @@ The model has been evaluated on [Mean Reciprocal Rank (MRR)](https://en.wikipedi
|
|
33 |
When the model has to pick the positive example out of a pool of 32, ranks the positive example highest most of the time.
|
34 |
When the pool is significantly enlarged to 10.000 functions, it still ranks the positive example first or second in most cases.
|
35 |
|
36 |
-
|
37 |
| Model | Pool size | MRR | Recall@1 |
|
38 |
|---------|-----------|------|----------|
|
39 |
| ASMBert | 32 | 0.99 | 0.99 |
|
40 |
| ASMBert | 10.000 | 0.87 | 0.83 |
|
41 |
|
42 |
-
|
43 |
## Purpose and use of the model
|
44 |
|
45 |
### For which problem has the model been designed?
|
@@ -51,8 +48,6 @@ We do not see other applications for this model.
|
|
51 |
### To what problems is the model not applicable?
|
52 |
This model has been finetuned on the semantic search task, for a generic ARM64-BERT model, please refer to the [other
|
53 |
model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert) we have published.
|
54 |
-
|
55 |
-
|
56 |
|
57 |
## Data
|
58 |
### What data was used for training and evaluation?
|
@@ -64,7 +59,6 @@ in a maximum of 10 (5*2) different functions which are semantically similar i.e.
|
|
64 |
The dataset is split into a train and a test set. This is done on project level, so all binaries and functions belonging to one project are part of
|
65 |
either the train or the test set, not both. We have not performed any deduplication on the dataset for training.
|
66 |
|
67 |
-
|
68 |
| set | # functions |
|
69 |
|-------|------------:|
|
70 |
| train | 18,083,285 |
|
@@ -78,14 +72,12 @@ After training our models, we found out that something had gone wrong when compi
|
|
78 |
the last line (instruction) of the previous function was included in the next. This has been fixed for the finetuning, but due to the long training process, and the
|
79 |
good performance of the model despite the mistake, we have decided not to retrain the base model.
|
80 |
|
81 |
-
|
82 |
-
|
83 |
## Fairness Metrics
|
84 |
|
85 |
### Which metrics have been used to measure bias in the data/model and why?
|
86 |
n.a.
|
87 |
|
88 |
-
### What do those metrics show?
|
89 |
n.a.
|
90 |
|
91 |
### Any other notable issues?
|
|
|
6 |
library_name: sentence-transformers
|
7 |
---
|
8 |
Model Card
|
9 |
+
==========
|
10 |
|
11 |
_Who to contact:_ fbda [at] nfi [dot] nl \
|
12 |
TODO: add link to github repo
|
|
|
22 |
although the typical Next Sentence Prediction has been replaced with Jump Target Prediction, as proposed in Wang et al.
|
23 |
This architecture has subsequently been finetuned for semantic search purposes. We have followed the procedure proposed by [S-BERT](https://www.sbert.net/examples/applications/semantic-search/README.html).
|
24 |
|
|
|
25 |
### What is the output of the model?
|
26 |
The model returns a vector of 768 dimensions for each function that it's given. These vectors can be compared to
|
27 |
get an indication of which functions are similar to each other.
|
|
|
32 |
When the model has to pick the positive example out of a pool of 32, ranks the positive example highest most of the time.
|
33 |
When the pool is significantly enlarged to 10.000 functions, it still ranks the positive example first or second in most cases.
|
34 |
|
|
|
35 |
| Model | Pool size | MRR | Recall@1 |
|
36 |
|---------|-----------|------|----------|
|
37 |
| ASMBert | 32 | 0.99 | 0.99 |
|
38 |
| ASMBert | 10.000 | 0.87 | 0.83 |
|
39 |
|
|
|
40 |
## Purpose and use of the model
|
41 |
|
42 |
### For which problem has the model been designed?
|
|
|
48 |
### To what problems is the model not applicable?
|
49 |
This model has been finetuned on the semantic search task, for a generic ARM64-BERT model, please refer to the [other
|
50 |
model](https://huggingface.co/NetherlandsForensicInstitute/ARM64bert) we have published.
|
|
|
|
|
51 |
|
52 |
## Data
|
53 |
### What data was used for training and evaluation?
|
|
|
59 |
The dataset is split into a train and a test set. This is done on project level, so all binaries and functions belonging to one project are part of
|
60 |
either the train or the test set, not both. We have not performed any deduplication on the dataset for training.
|
61 |
|
|
|
62 |
| set | # functions |
|
63 |
|-------|------------:|
|
64 |
| train | 18,083,285 |
|
|
|
72 |
the last line (instruction) of the previous function was included in the next. This has been fixed for the finetuning, but due to the long training process, and the
|
73 |
good performance of the model despite the mistake, we have decided not to retrain the base model.
|
74 |
|
|
|
|
|
75 |
## Fairness Metrics
|
76 |
|
77 |
### Which metrics have been used to measure bias in the data/model and why?
|
78 |
n.a.
|
79 |
|
80 |
+
### What do those metrics show?
|
81 |
n.a.
|
82 |
|
83 |
### Any other notable issues?
|