Update README.md
Browse files
README.md
CHANGED
@@ -1,9 +1,97 @@
|
|
1 |
-
Binary classification
|
2 |
|
|
|
3 |
|
4 |
-
|
5 |
-
|
6 |
-
|
7 |
-
|
8 |
-
|
9 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
Binary classification model for ad-detection on QA Systems.
|
2 |
|
3 |
+
## Sample usage
|
4 |
|
5 |
+
```
|
6 |
+
import torch
|
7 |
+
from transformers import AutoTokenizer, AutoModelForSequenceClassification
|
8 |
+
|
9 |
+
classifier_model_path = "jmvcoelho/ad-classifier-v0.1"
|
10 |
+
tokenizer = AutoTokenizer.from_pretrained(classifier_model_path)
|
11 |
+
model = AutoModelForSequenceClassification.from_pretrained(classifier_model_path)
|
12 |
+
model.eval()
|
13 |
+
|
14 |
+
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
|
15 |
+
model.to(device)
|
16 |
+
|
17 |
+
def classify(passages):
|
18 |
+
inputs = tokenizer(
|
19 |
+
passages, padding=True, truncation=True, max_length=512, return_tensors="pt"
|
20 |
+
)
|
21 |
+
inputs = {k: v.to(device) for k, v in inputs.items()}
|
22 |
+
with torch.no_grad():
|
23 |
+
outputs = model(**inputs)
|
24 |
+
logits = outputs.logits
|
25 |
+
predictions = torch.argmax(logits, dim=-1)
|
26 |
+
return predictions.cpu().tolist()
|
27 |
+
|
28 |
+
preds = classify(["sample_text_1", "sample_text_2"])
|
29 |
+
```
|
30 |
+
|
31 |
+
|
32 |
+
## Version
|
33 |
+
|
34 |
+
- v0.0: Trained with the official data from Webis Generated Native Ads 2024
|
35 |
+
- v0.1: Trained with v0.0 data + new synthetic data
|
36 |
+
|
37 |
+
|
38 |
+
## New Synthetic Data
|
39 |
+
|
40 |
+
Objective: Given (query, answer) pair, generate new_answer which contains an advertisement.
|
41 |
+
|
42 |
+
### Prompt:
|
43 |
+
|
44 |
+
```
|
45 |
+
prompt = f"""
|
46 |
+
Task: You are tasked with subtly integrating an advertisement into a search query response. The goal is to make the advertisement feel natural and helpful within the context of the response, not disruptive or overtly promotional.
|
47 |
+
First, you should define the item to advertise. You should keep in mind the context of the query and original response. Consider the following advertisement qualities when choosing the product:
|
48 |
+
|
49 |
+
* Product/Service Category.
|
50 |
+
* Specific Advertisement Theme.
|
51 |
+
* Key Selling Points/Qualities to Emphasize.
|
52 |
+
|
53 |
+
Instructions for Integration:
|
54 |
+
|
55 |
+
1. Contextual Relevance is Key: The advertisement must feel relevant to the search query and the existing response text. Think about how the advertised product/service genuinely relates to the user's needs implied by the query.
|
56 |
+
|
57 |
+
2. Subtle and Natural Language: Use language that blends seamlessly with the original response. Avoid overly promotional or sales-oriented phrasing. Focus on informative and helpful language.
|
58 |
+
|
59 |
+
3. Focus on Benefits, Not Just Features: Instead of directly listing qualities, rephrase them as benefits the user might gain in relation to the search query and response context.
|
60 |
+
|
61 |
+
4. Strategic Placement: Choose the most natural and impactful location(s) within the response to subtly introduce the advertisement. This might involve:
|
62 |
+
* Briefly modifying an existing sentence to subtly incorporate the advertisement.
|
63 |
+
* Adding a short, relevant phrase or clause to an existing sentence.
|
64 |
+
* In rare cases, adding a very short, contextual sentence (only if it feels truly natural).
|
65 |
+
|
66 |
+
5. Maintain Original Meaning: Ensure the core meaning and factual accuracy of the original response remain unchanged. The advertisement should enhance, not distort, the original information.
|
67 |
+
|
68 |
+
6. Review for Subtlety: Before returning the response, critically evaluate if the advertisement feels genuinely subtle and integrated. If it feels forced or obvious, refine your approach.
|
69 |
+
|
70 |
+
Output: Return **only** the modified response with the subtly integrated advertisement.
|
71 |
+
|
72 |
+
---
|
73 |
+
|
74 |
+
Search Query: {query}
|
75 |
+
Original Response:
|
76 |
+
|
77 |
+
{answer}
|
78 |
+
|
79 |
+
Modified Response:
|
80 |
+
"""
|
81 |
+
```
|
82 |
+
|
83 |
+
### Initial Data:
|
84 |
+
|
85 |
+
- queries: Obtained from MS-MARCO V2.1 QA task. 150K subset of queries that are associated with a "well formed answer"
|
86 |
+
- answer: Generated with Qwen2.5-7B-Instruct using RAG with 10 passages (from our model.)
|
87 |
+
|
88 |
+
### Models used for generation
|
89 |
+
|
90 |
+
|
91 |
+
Each model generated for 1/4th of the (query, answer) pairs
|
92 |
+
- Gemma-2-9b-it
|
93 |
+
- LLaMA-3.1-8B-Instruct
|
94 |
+
- Mistral-7B-Instruct
|
95 |
+
- Qwen2.5-7B-Instruct
|
96 |
+
|
97 |
+
|