File size: 4,267 Bytes
331920b
 
 
a644f69
 
 
 
331920b
 
 
 
2039166
 
331920b
 
 
 
 
 
 
2039166
331920b
 
 
2039166
 
 
 
 
 
 
331920b
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
adb3ef4
331920b
a644f69
 
 
331920b
a644f69
 
 
 
331920b
a644f69
 
331920b
a644f69
 
 
 
331920b
a644f69
 
 
331920b
a644f69
331920b
a644f69
331920b
 
a644f69
 
 
adb3ef4
 
331920b
a644f69
 
331920b
 
 
a644f69
331920b
a644f69
331920b
a644f69
331920b
 
 
 
 
 
d4a624f
2039166
 
 
 
 
 
 
 
 
d4a624f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
---
base_model: mistralai/Mistral-7B-Instruct-v0.2
library_name: peft
datasets:
- tumeteor/Security-TTP-Mapping
language:
- en
---

# Model Card for Model ID

This Model is built based on Mistral-7B which take attack scenario as input and it outputs techniques used by attacker




## Model Details

### Model Description

This Model is built based on Mistral-7B which take attack scenario as input and it outputs techniques used by attacker



- **Developed by:** Harish Santhanalakshmi Ganesan
- **Funded by [optional]:** None
- **Shared by [optional]:** None
- **Model type:** LLM
- **Language(s) (NLP):** English 
- **License:** Apache 2.0
- **Finetuned from model [optional]:** mistralai/Mistral-7B-Instruct-v0.2

### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]
- **Demo [optional]:** [More Information Needed]



[More Information Needed]

## Bias, Risks, and Limitations

<!-- This section is meant to convey both technical and sociotechnical limitations. -->

[More Information Needed]

### Recommendations

<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

## How to Get Started with the Model

```

import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer

peft_model_id = "rootxhacker/mistralai-7B-attack2ttp"
config = PeftConfig.from_pretrained(peft_model_id)
model = AutoModelForCausalLM.from_pretrained(config.base_model_name_or_path, return_dict=True, load_in_4bit=True, device_map='auto')
tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)

def get_completion(query: str, model, tokenizer) -> str:
  device = "cuda:0"

  prompt_template = """
  here is intruction you need to map Attack scenario with TTPs
  ### Question:
  {query}

  ### Answer:
  """
  prompt = prompt_template.format(query=query)

  encodeds = tokenizer(prompt, return_tensors="pt", add_special_tokens=True)

  model_inputs = encodeds.to(device)


  generated_ids = model.generate(**model_inputs, max_new_tokens=1000, do_sample=True, pad_token_id=tokenizer.eos_token_id)
  decoded = tokenizer.batch_decode(generated_ids)
  return (decoded[0])

```

# Load the Lora model
model = PeftModel.from_pretrained(model, peft_model_id)

[More Information Needed]

## Training Details

### Training Data

https://huggingface.co/datasets/tumeteor/Security-TTP-Mapping

[More Information Needed]


## Citation [optional]

```
@inproceedings{nguyen-srndic-neth-ttpm,
    title = "Noise Contrastive Estimation-based Matching Framework for Low-resource Security Attack Pattern Recognition",
    author = "Nguyen, Tu and Šrndić, Nedim and Neth, Alexander",
    booktitle = "Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics",
    month = mar,
    year = "2024",
    publisher = "Association for Computational Linguistics",
    abstract = "Tactics, Techniques and Procedures (TTPs) represent sophisticated attack patterns in the cybersecurity domain, described encyclopedically in textual knowledge bases. Identifying TTPs in cybersecurity writing, often called TTP mapping, is an important and challenging task. Conventional learning approaches often target the problem in the classical multi-class or multilabel classification setting. This setting hinders the learning ability of the model due to a large number of classes (i.e., TTPs), the inevitable skewness of the label distribution and the complex hierarchical structure of the label space. We formulate the problem in a different learning paradigm, where the assignment of a text to a TTP label is decided by the direct semantic similarity between the two, thus reducing the complexity of competing solely over the large labeling space. To that end, we propose a neural matching architecture with an effective sampling-based learn-to-compare mechanism, facilitating the learning process of the matching model despite constrained resources.",
}
```