root-signals
/

RootSignals-Judge-Llama-70B

@@ -184,7 +184,7 @@ docker run \
 ```
 We validated the model on arm64 with [vLLM](https://github.com/vllm-project/vllm) on Nvidia GH200 as well with max outputs up to 64k tokens:
-```
 docker run \
    --gpus all \
    --ipc=host  \
@@ -199,6 +199,92 @@ docker run \
    --enable_prefix_caching
 ```
 # 4. Model Details
 ## 4.1 Overview

 ```
 We validated the model on arm64 with [vLLM](https://github.com/vllm-project/vllm) on Nvidia GH200 as well with max outputs up to 64k tokens:
+```bash
 docker run \
    --gpus all \
    --ipc=host  \
    --enable_prefix_caching
 ```
+Detect hallucinations from context, example uses halubench:
+```python
+decompose_system_instruction = """
+<TASK>
+You are a fair judge that detects hallucinations and unjustified assumptions from question-document-answer triplets provided by the user.
+Always follow the instructions below and provide your reasoning and verdict in the format specified.
+</TASK>
+<INSTRUCTIONS>
+#1. Identify key elements in the question.
+#2. List all relevant facts provided in the document.
+#3. Break down the answer into its component claims.
+#4. For each claim in the answer:
+#a. Is it explicitly supported by the document? If yes, quote the relevant part.
+#b. Is it a reasonable inference from the document? If yes, explain the reasoning.
+#c. Is it unsupported or contradicted by the document? If yes, explain why.
+#5. Check for any information in the answer that's present in the question but not in the document.
+#6. Verify that no additional information is introduced in the answer that isn't in the document or question.
+#7. Assess if the answer makes any unjustified connections or assumptions.
+</INSTRUCTIONS>
+<OUTPUT_EXAMPLE>
+{"REASONING": "Your reasoning here where you cite the instruction step by number and provide your reasoning", "VERDICT": "PASS" or "FAIL"}
+</OUTPUT_EXAMPLE>
+"""
+decompose_prompt = """
+<QUESTION>: {question} </QUESTION>
+<DOCUMENT>: {document} </DOCUMENT>
+<ANSWER>: {answer} </ANSWER>
+""".strip()
+import os
+import json
+import pandas as pd
+from openai import OpenAI
+from pprint import pprint
+from pydantic import BaseModel
+testset_df = pd.read_parquet("hf://datasets/PatronusAI/HaluBench/data/test-00000-of-00001.parquet")
+testset_df = testset_df.sample(frac=1).reset_index(drop=True)
+class DecomposeResponse(BaseModel):
+    REASONING: str
+    VERDICT: str
+client = OpenAI(base_url="http://localhost:8000/v1")  # export a different one for e.g. sglang, openrouter, etc.
+response = client.beta.chat.completions.parse(
+    model="root-signals/RootSignals-Judge-Llama-70B",  # or `RootJudge` if you are using the RootSignals API
+    messages=[
+        {"role": "system", "content": decompose_system_instruction},
+        {"role": "user", "content": decompose_prompt.format(
+            question=example_row["question"],
+            document=example_row["passage"],
+            answer=example_row["answer"])},
+    ],
+    response_format=DecomposeResponse,
+).choices[0].message.parsed
+pprint(response.REASONING)
+pprint(response.VERDICT)
+```
+```
+> ('Following the instructions: #1, the key element in the question is the '
+ "nationality of the magazines. #2, the document states that 'The Woman's "
+ "Viewpoint was a woman's magazine founded in Texas in 1923' and 'Pick Me Up! "
+ "is a British weekly women's magazine'. #3, the answer claims both magazines "
+ 'are British. #4, checking each claim in the answer: a) The document does not '
+ "support the claim that The Woman's Viewpoint is British, instead, it says "
+ "the magazine was founded in Texas. b) There's no reasonable inference from "
+ "the document that would suggest The Woman's Viewpoint is British. c) The "
+ "claim about The Woman's Viewpoint is contradicted by the document. #5, the "
+ 'answer introduces information (both being British) not supported by the '
+ 'document. #6, additional information about both magazines being British is '
+ 'introduced in the answer without being present in the document or question. '
+ '#7, the answer makes an unjustified assumption by stating both magazines are '
+ "British despite the document clearly stating The Woman's Viewpoint was "
+ 'founded in Texas, implying it is not British. Therefore, the answer fails to '
+ 'accurately reflect the information provided in the document and makes '
+ 'unjustified assumptions based on the information given in the question and '
+ "document.', ")
+'FAIL'
+```
 # 4. Model Details
 ## 4.1 Overview