jmvcoelho commited on
Commit
f47fb2a
·
verified ·
1 Parent(s): 19d1bfe

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +53 -0
README.md ADDED
@@ -0,0 +1,53 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ base_model:
5
+ - microsoft/deberta-v3-base
6
+ pipeline_tag: text-classification
7
+ ---
8
+ Binary classification model for ad-detection on QA Systems.
9
+
10
+ ## Sample usage
11
+
12
+ ```
13
+ import torch
14
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
15
+
16
+ classifier_model_path = "jmvcoelho/ad-classifier-v0.0"
17
+ tokenizer = AutoTokenizer.from_pretrained(classifier_model_path)
18
+ model = AutoModelForSequenceClassification.from_pretrained(classifier_model_path)
19
+ model.eval()
20
+
21
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
22
+ model.to(device)
23
+
24
+ def classify(passages):
25
+ inputs = tokenizer(
26
+ passages, padding=True, truncation=True, max_length=512, return_tensors="pt"
27
+ )
28
+ inputs = {k: v.to(device) for k, v in inputs.items()}
29
+ with torch.no_grad():
30
+ outputs = model(**inputs)
31
+ logits = outputs.logits
32
+ predictions = torch.argmax(logits, dim=-1)
33
+ return predictions.cpu().tolist()
34
+
35
+ preds = classify(["sample_text_1", "sample_text_2"])
36
+ ```
37
+
38
+
39
+ ## Version
40
+
41
+ - **v0.0:** Trained with the official data from Webis Generated Native Ads 2024
42
+ - v0.1: Trained with v0.0 data + new synthetic data
43
+
44
+
45
+ ## Webis Generated Native Ads 2024
46
+
47
+ **Paper:** [Detecting Generated Native Ads in Conversational Search](https://dl.acm.org/doi/10.1145/3589335.3651489)
48
+
49
+ **Data summary:**
50
+ - YouChat and Microsoft Copilot were used to generate answers for competitve keywork queries;
51
+ - GPT-4 turbo was used to insert one advertisment into the answer;
52
+ - This creates triples (query, answer_with_ad, answer_without_ad)
53
+ - The classifier in this repo was trained to assign 0 to answer_without_ad, and 1 to answer_with_ad.