jmvcoelho commited on
Commit
c5a8450
·
verified ·
1 Parent(s): e429421

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +95 -7
README.md CHANGED
@@ -1,9 +1,97 @@
1
- Binary classification
2
 
 
3
 
4
- Trained with synthetic data:
5
- - Gpt4 (original dataset)
6
- - gemma-2-9b-it
7
- - llama-3.1-8B-Instruct
8
- - Mistral-7B-Intruct-v0.3
9
- - Qwen2.5-7B-Instruct
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Binary classification model for ad-detection on QA Systems.
2
 
3
+ ## Sample usage
4
 
5
+ ```
6
+ import torch
7
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
8
+
9
+ classifier_model_path = "jmvcoelho/ad-classifier-v0.1"
10
+ tokenizer = AutoTokenizer.from_pretrained(classifier_model_path)
11
+ model = AutoModelForSequenceClassification.from_pretrained(classifier_model_path)
12
+ model.eval()
13
+
14
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
15
+ model.to(device)
16
+
17
+ def classify(passages):
18
+ inputs = tokenizer(
19
+ passages, padding=True, truncation=True, max_length=512, return_tensors="pt"
20
+ )
21
+ inputs = {k: v.to(device) for k, v in inputs.items()}
22
+ with torch.no_grad():
23
+ outputs = model(**inputs)
24
+ logits = outputs.logits
25
+ predictions = torch.argmax(logits, dim=-1)
26
+ return predictions.cpu().tolist()
27
+
28
+ preds = classify(["sample_text_1", "sample_text_2"])
29
+ ```
30
+
31
+
32
+ ## Version
33
+
34
+ - v0.0: Trained with the official data from Webis Generated Native Ads 2024
35
+ - v0.1: Trained with v0.0 data + new synthetic data
36
+
37
+
38
+ ## New Synthetic Data
39
+
40
+ Objective: Given (query, answer) pair, generate new_answer which contains an advertisement.
41
+
42
+ ### Prompt:
43
+
44
+ ```
45
+ prompt = f"""
46
+ Task: You are tasked with subtly integrating an advertisement into a search query response. The goal is to make the advertisement feel natural and helpful within the context of the response, not disruptive or overtly promotional.
47
+ First, you should define the item to advertise. You should keep in mind the context of the query and original response. Consider the following advertisement qualities when choosing the product:
48
+
49
+ * Product/Service Category.
50
+ * Specific Advertisement Theme.
51
+ * Key Selling Points/Qualities to Emphasize.
52
+
53
+ Instructions for Integration:
54
+
55
+ 1. Contextual Relevance is Key: The advertisement must feel relevant to the search query and the existing response text. Think about how the advertised product/service genuinely relates to the user's needs implied by the query.
56
+
57
+ 2. Subtle and Natural Language: Use language that blends seamlessly with the original response. Avoid overly promotional or sales-oriented phrasing. Focus on informative and helpful language.
58
+
59
+ 3. Focus on Benefits, Not Just Features: Instead of directly listing qualities, rephrase them as benefits the user might gain in relation to the search query and response context.
60
+
61
+ 4. Strategic Placement: Choose the most natural and impactful location(s) within the response to subtly introduce the advertisement. This might involve:
62
+ * Briefly modifying an existing sentence to subtly incorporate the advertisement.
63
+ * Adding a short, relevant phrase or clause to an existing sentence.
64
+ * In rare cases, adding a very short, contextual sentence (only if it feels truly natural).
65
+
66
+ 5. Maintain Original Meaning: Ensure the core meaning and factual accuracy of the original response remain unchanged. The advertisement should enhance, not distort, the original information.
67
+
68
+ 6. Review for Subtlety: Before returning the response, critically evaluate if the advertisement feels genuinely subtle and integrated. If it feels forced or obvious, refine your approach.
69
+
70
+ Output: Return **only** the modified response with the subtly integrated advertisement.
71
+
72
+ ---
73
+
74
+ Search Query: {query}
75
+ Original Response:
76
+
77
+ {answer}
78
+
79
+ Modified Response:
80
+ """
81
+ ```
82
+
83
+ ### Initial Data:
84
+
85
+ - queries: Obtained from MS-MARCO V2.1 QA task. 150K subset of queries that are associated with a "well formed answer"
86
+ - answer: Generated with Qwen2.5-7B-Instruct using RAG with 10 passages (from our model.)
87
+
88
+ ### Models used for generation
89
+
90
+
91
+ Each model generated for 1/4th of the (query, answer) pairs
92
+ - Gemma-2-9b-it
93
+ - LLaMA-3.1-8B-Instruct
94
+ - Mistral-7B-Instruct
95
+ - Qwen2.5-7B-Instruct
96
+
97
+