jmvcoelho
/

ad-classifier-v0.1

Text Classification

Safetensors

English

deberta-v2

Model card Files Files and versions Community

jmvcoelho commited on Mar 2

Commit

c5a8450

verified ·

1 Parent(s): e429421

Update README.md

Browse files

Files changed (1) hide show

README.md +95 -7

README.md CHANGED Viewed

@@ -1,9 +1,97 @@
-Binary classification
-Trained with synthetic data:
-- Gpt4 (original dataset)
-- gemma-2-9b-it
-- llama-3.1-8B-Instruct
-- Mistral-7B-Intruct-v0.3
-- Qwen2.5-7B-Instruct

+Binary classification model for ad-detection on QA Systems.
+## Sample usage
+```
+import torch
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+classifier_model_path = "jmvcoelho/ad-classifier-v0.1"
+tokenizer = AutoTokenizer.from_pretrained(classifier_model_path)
+model = AutoModelForSequenceClassification.from_pretrained(classifier_model_path)
+model.eval()
+device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
+model.to(device)
+def classify(passages):
+    inputs = tokenizer(
+        passages, padding=True, truncation=True, max_length=512, return_tensors="pt"
+    )
+    inputs = {k: v.to(device) for k, v in inputs.items()}
+    with torch.no_grad():
+        outputs = model(**inputs)
+        logits = outputs.logits
+        predictions = torch.argmax(logits, dim=-1)
+    return predictions.cpu().tolist()
+preds = classify(["sample_text_1", "sample_text_2"])
+```
+## Version
+- v0.0: Trained with the official data from Webis Generated Native Ads 2024
+- v0.1: Trained with v0.0 data + new synthetic data
+## New Synthetic Data
+Objective: Given (query, answer) pair, generate new_answer which contains an advertisement.
+### Prompt:
+```
+prompt = f"""
+            Task: You are tasked with subtly integrating an advertisement into a search query response. The goal is to make the advertisement feel natural and helpful within the context of the response, not disruptive or overtly promotional.
+            First, you should define the item to advertise. You should keep in mind the context of the query and original response. Consider the following advertisement qualities when choosing the product:
+            * Product/Service Category.
+            * Specific Advertisement Theme.
+            * Key Selling Points/Qualities to Emphasize.
+            Instructions for Integration:
+            1.  Contextual Relevance is Key:  The advertisement must feel relevant to the search query and the existing response text.  Think about how the advertised product/service genuinely relates to the user's needs implied by the query.
+            2.  Subtle and Natural Language:  Use language that blends seamlessly with the original response. Avoid overly promotional or sales-oriented phrasing. Focus on informative and helpful language.
+            3.  Focus on Benefits, Not Just Features: Instead of directly listing qualities, rephrase them as benefits the user might gain in relation to the search query and response context.
+            4.  Strategic Placement: Choose the most natural and impactful location(s) within the response to subtly introduce the advertisement. This might involve:
+                * Briefly modifying an existing sentence to subtly incorporate the advertisement.
+                * Adding a short, relevant phrase or clause to an existing sentence.
+                * In rare cases, adding a very short, contextual sentence (only if it feels truly natural).
+            5.  Maintain Original Meaning:  Ensure the core meaning and factual accuracy of the original response remain unchanged. The advertisement should enhance, not distort, the original information.
+            6.  Review for Subtlety: Before returning the response, critically evaluate if the advertisement feels genuinely subtle and integrated. If it feels forced or obvious, refine your approach.
+            Output: Return **only** the modified response with the subtly integrated advertisement.
+            ---
+            Search Query: {query}
+            Original Response:
+            {answer}
+            Modified Response:
+        """
+```
+### Initial Data:
+- queries: Obtained from MS-MARCO V2.1 QA task. 150K subset of queries that are associated with a "well formed answer"
+- answer: Generated with Qwen2.5-7B-Instruct using RAG with 10 passages (from our model.)
+### Models used for generation
+Each model generated for 1/4th of the (query, answer) pairs
+- Gemma-2-9b-it
+- LLaMA-3.1-8B-Instruct
+- Mistral-7B-Instruct
+- Qwen2.5-7B-Instruct