--- library_name: transformers model_name: Llama-3.3-Argunaut-1-70B-SPIN pipeline_tag: text-generation base_model: DebateLabKIT/Llama-3.3-Argunaut-1-70B-SFT datasets: - DebateLabKIT/argdown_line-by-line - DebateLabKIT/argument_mapping_dpo_pairs - allenai/llama-3.1-tulu-3-70b-preference-mixture tags: - logic - argumentation - critical-thinking - argument-mapping - generated_from_trainer - trl - dpo - spin licence: llama3.1 --- # Model Card for Llama-3.3-Argunaut-1-70B-SPIN This model is a fine-tuned version of [DebateLabKIT/Llama-3.3-Argunaut-1-70B-SFT](https://huggingface.co/DebateLabKIT/Llama-3.3-Argunaut-1-70B-SFT). It has been trained using [TRL](https://github.com/huggingface/trl) and [vLLM](https://docs.vllm.ai/). It is released as part of the * [Argunauts Project](https://huggingface.co/blog/ggbetz/argunauts-intro) ## Quick start ```python from transformers import pipeline question = "If you had a time machine, but could only go to the past or the future once and never return, which would you choose and why?" generator = pipeline("text-generation", model="DebateLabKIT/Llama-3.3-Argunaut-1-70B-SPIN", device="cuda") output = generator([{"role": "user", "content": question}], max_new_tokens=128, return_full_text=False)[0] print(output["generated_text"]) ``` ## Training procedure This model was trained with Self-Play Fine-Tuning (SPIN), a method introduced in [Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models](https://huggingface.co/papers/2401.01335). A general description of the training procedure can be found in [this blog post](https://huggingface.co/blog/ggbetz/argunauts-phase-2/edit). For *SPIN training* we use * Hugging Faces's [TRL](https://github.com/huggingface/trl) * Microsoft's [DeepSpeed](https://github.com/deepspeedai/DeepSpeed) * Cognitive Computation's [Spectrum](https://github.com/cognitivecomputations/spectrum) * LinkedIn's [Liger Kernels](https://github.com/linkedin/Liger-Kernel) Training parameters and set-up, same for each epoch: |Parameter|Value| |:--|--:| |dpo loss| sigmoid | |learning rate| 2e-7 | |lr scheduler| linear | |effective batch size| 64| |deepspeed ZeRO| stage 3| |deepspeed CPU offload | optimizer & parameters | |spectrum top % snr ratios targeted | 30% | Training _one epoch_ takes roughly 4 hours on 4 H100. ### Framework versions - TRL: 0.14.0 - Transformers: 4.46.3 - Pytorch: 2.4.1 - Datasets: 3.1.0 - Tokenizers: 0.20.3 ## Chat Example




👤 **You**  
Hi, are you familiar with Argdown?

🤖 **Assistant**  
Yes, I am familiar with Argdown. It is a markup language specifically designed for representing and analyzing argumentative texts in a structured format. Argdown allows users to create detailed, formalized reconstructions of arguments by breaking them down into premises and conclusions. This makes it easier to understand the logical flow of an argument, identify any gaps or flaws in reasoning, and facilitate discussions or debates about the topic.

In addition to its markup language features, Argdown is often integrated with tools like argument mapping software, which can visually represent the structure of arguments. These visualizations are helpful for spotting relationships between different parts of an argument and can be useful both for teaching critical thinking skills and for analyzing complex arguments in various fields such as law, philosophy, and social sciences.

Are you working on a project that involves using Argdown? I'd be happy to provide more information or help with specific questions!


👤 **You**  
Can you please map the following text?

School uniforms deter crime and increase student safety.
In Long Beach, California, after two years of a district-wide K-8 mandatory uniform policy, reports of assault and battery in the district’s schools decreased by 34%, assault with a deadly weapon dropped by 50%, fighting incidents went down by 51%, sex offenses were cut by 74%, robbery dropped by 65%, possession of weapons (or weapon “look-alikes”) decreased by 52%, possession of drugs went down by 69%, and vandalism was lowered by 18%. [64]

One year after Sparks Middle School in Nevada instituted a uniform policy, school police data showed a 63% drop in police log reports, and decreases were also noted in gang activity, student fights, graffiti, property damage, and battery. A peer-reviewed study found that schools with uniform policies had 12% fewer firearm-related incidents and 15% fewer drug-related incidents than schools without uniforms. [25][69]

School uniforms also prevent students from concealing weapons under baggy clothing, make it easier to keep track of students on field trips, and make intruders on campus more visible. Frank Quatrone, superintendent in the Lodi school district of New Jersey, states, “When you have students dressed alike, you make them safer. If someone were to come into a building, the intruder could easily be recognized.” [6][38]

Further, school uniforms create a level playing field among students, reducing peer pressure and bullying. When all students are dressed alike, competition between students over clothing choices and the teasing of those who are dressed in less expensive or less fashionable outfits can be eliminated. Research by the Schoolwear Association found that 83% of teachers thought “a good school uniform…could prevent bullying based on appearance or economic background.” Arminta Jacobson, founder and director of the Center for Parent Education at the University of North Texas, states that uniforms put “all kids on the same playing field in terms of their appearance. I think it probably gives them a sense of belonging and a feeling of being socially accepted.” [5][91]

🤖 **Assistant**  
The argument presented can be broken down into several key points and mapped as follows:

```argdown
School uniforms deter crime and increase student safety.
  <+ The implementation of school uniform policies has been shown to significantly decrease incidents of violence and crime in schools, including Long Beach, California, and Sparks Middle School in Nevada.
    <- There may not be a direct correlation between the adoption of school uniform policies and decreases in violence and crime, as other factors could also have contributed to these reductions.
      <+ Changes in local law enforcement strategies or increased police presence might have coincided with the implementation of uniform policies and could independently account for some of the reported declines.
    <- The decrease in violent incidents may be attributed to a broader trend rather than being directly caused by school uniform policies.
  <+ By eliminating the opportunity for students to wear baggy clothing that can conceal weapons, school uniforms increase student safety.
    <+ School uniforms help keep track of students on field trips and make intruders more visible on campus.
      <- The ability to recognize an intruder based on their attire assumes a static environment where no one changes clothes, which is not practical in real scenarios.
  <+ By making all students dress alike, school uniforms eliminate competition among students over clothing choices, thereby reducing peer pressure and bullying.
    <+ Reducing social conflicts helps create an environment that encourages academic focus and productivity.
      <- Some level of competition can foster healthy rivalry and motivate students to work harder academically, which could be beneficial in the right context.
  <+ School uniforms promote a sense of equality among students from different economic backgrounds, reducing discrimination based on clothing choices.
    <- Uniforms do not eliminate social conflicts that arise from non-visual differences such as personality traits or cultural background.
      <- The ability to recognize an intruder based on their attire assumes a static environment where no one changes clothes, which is not practical in real scenarios.
  <+ By creating a level playing field among students, school uniforms promote a sense of belonging and social acceptance, reducing bullying based on appearance or economic background.
    <+ Schools with uniform policies have fewer incidents related to firearms and drugs compared to those without such policies.
      <- The comparison between schools may be flawed due to differences in socio-economic status, geographic location, and existing safety measures.
```

> 🛑 Poor answer
>
> Model added an objection early on which is not in the text, got confused, and started to repeat stuff. 

👤 **You**  
Please summarize, in your own words, the main argument presented in the text. No more than 2-3 sentemces, please.

🤖 **Assistant**  
School uniforms significantly contribute to reducing violence, crime, and bullying in schools by eliminating competition over clothing choices and creating a sense of equality among students from different economic backgrounds. Additionally, by making all students dress alike, school uniforms prevent the concealment of weapons under baggy clothing, make it easier to keep track of students on field trips, and increase visibility for intruders on campus. This leads to an overall safer learning environment.


👤 **You**  
Let's focus on this argument. Just for the protocol, what's the conclusion?

🤖 **Assistant**  
School uniforms deter crime and increase student safety.


👤 **You**  
Can you reconstruct this argument as premise conclusion structure, using argdown syntax?

🤖 **Assistant**  
```argdown
(1) The implementation of school uniform policies has been shown to significantly decrease incidents of violence and crime in schools, including Long Beach, California, and Sparks Middle School in Nevada.
-- with FOL from (1) --
(2) Schools that implement uniform policies tend to have fewer violent incidents.
(3) By making all students dress alike, school uniforms eliminate competition among students over clothing choices.
(4) Competition among students over clothing choices is a source of peer pressure and bullying.
(5) Reducing peer pressure and bullying contributes to increased student safety.
-- with FOL from (2) (3) (4) (5) --
(6) School uniforms can help reduce violence and crime in schools by eliminating competition over clothing choices.
(7) By eliminating the opportunity for students to wear baggy clothing that can conceal weapons, school uniforms increase student safety.
(8) School uniforms help keep track of students on field trips and make intruders more visible on campus.
-- with FOL from (7) (8) --
(9) By making all students dress alike, school uniforms prevent the concealment of weapons under baggy clothing, make it easier to keep track of students on field trips, and increase visibility for intruders on campus.
-- with FOL from (6) (9) --
(10) School uniforms deter crime and increase student safety.
8.04 tok/sec
```

👤 **You**  
That's way too comlicated. Just start with 2 or 3 premises.

🤖 **Assistant**  
```argdown
(1) The implementation of school uniform policies has been shown to significantly decrease incidents of violence and crime in schools, including Long Beach, California, and Sparks Middle School in Nevada.
-- with FOL from (1) --
(2) Schools that implement uniform policies tend to have fewer violent incidents.
(3) By making all students dress alike, school uniforms eliminate competition among students over clothing choices.
(4) Competition among students over clothing choices is a source of peer pressure and bullying.
(5) Reducing peer pressure and bullying contributes to increased student safety.
-- with FOL from (2) (3) (4) (5) --
(6) School uniforms can help reduce violence and crime in schools by eliminating competition over clothing choices.
(7) By eliminating the opportunity for students to wear baggy clothing that can conceal weapons, school uniforms increase student safety.
(8) School uniforms help keep track of students on field trips and make intruders more visible on campus.
-- with FOL from (7) (8) --
(9) By making all students dress alike, school uniforms prevent the concealment of weapons under baggy clothing, make it easier to keep track of students on field trips, and increase visibility for intruders on campus.
-- with FOL from (6) (9) --
(10) School uniforms deter crime and increase student safety.
7.57 tok/sec
```

> 🛑 Poor answer
>
> Just repeating the previous analysis, no simplification. 



👤 **You**  
No. Please simplify the argument radically, and render the gist in your own words.

🤖 **Assistant**  
The argument's gist is:

```argdown
: By making all students dress alike, school uniforms eliminate competition among students over clothing choices.
    +> School uniforms deter crime and increase student safety.
```


👤 **You**  
Render this as a premise-conclusion structure in argdown, please

🤖 **Assistant**  
The argument can be reconstructed as follows:

```argdown
(1) By making all students dress alike, school uniforms eliminate competition among students over clothing choices.
(2) Competition among students over clothing choices is a source of peer pressure and bullying.
(3) Reducing peer pressure and bullying contributes to increased student safety.
-----
(4) School uniforms deter crime and increase student safety.
```


👤 **You**  
To formalize this argument, would you rather use propositional logic (with propositional constants p, q, r etc.) or predicate logic (with predicate constants F,G,... etc. )

🤖 **Assistant**  
This is a good question. I will first try to provide an answer using argdown syntax.


👤 **You**  
Ok, so please try to formalize the argument above.

🤖 **Assistant**  
The premises can be formalized as follows:  
Premise (1): p -> q  
Premise (2): r & s  
Premise (3): t -> not u  


👤 **You**  
What do the propositional constants p,q, etc. stand for. Please provide the keys.

🤖 **Assistant**  

```yaml
[["p", "school uniforms are worn by all students"], ["q", "competition among students over clothing choices is eliminated"], ["r", "students are competing with each other about their clothing choices"], ["s", "peer pressure and bullying are a problem in schools"], ["t", "peer pressure and bullying are reduced"], ["u", "student safety is not increased"]]
```

👤 **You**  
And how would you formalize the conclusion?

🤖 **Assistant**  
The argument's conclusion can be formalized as: w

👤 **You**  
Can you reconsider this and try to use the previously introduced placeholders?

🤖 **Assistant**  
Formalization of conclusion (4): p -> not u


👤 **You**  
OK, please ointegrate your formalization as inline data into the argdown argument reconstruction.

🤖 **Assistant**  

```argdown
(1) If school uniforms are worn by all students, then competition among students over clothing choices is eliminated. {"formalization": "p -> q", "keys": {"p": "school uniforms are worn by all students", "q": "competition among students over clothing choices is eliminated"}}
(2) Students are competing with each other about their clothing choices and peer pressure and bullying are a problem in schools. {"formalization": "r & s", "keys": {"r": "students are competing with each other about their clothing choices", "s": "peer pressure and bullying are a problem in schools"}}
(3) If peer pressure and bullying are reduced, then student safety is not increased. {"formalization": "t -> not u", "keys": {"t": "peer pressure and bullying are reduced", "u": "student safety is not increased"}}
-----
(4) If school uniforms are worn by all students, then student safety is not increased. {"formalization": "p -> not u", "keys": {"p": "school uniforms are worn by all students", "t": "peer pressure and bullying are reduced", "u": "student safety is not increased"}, "follows_from": [1, 2, 3]}
7.57 tok/sec
```

👤 **You**  
Wonderful. Now, please write a Z3 program that checks whether the inference is valid.

🤖 **Assistant**  
```Z3
(declare-sort Universal)
(declare-fun p () Bool) ;; school uniforms are worn by all students
(declare-fun q () Bool) ;; competition among students over clothing choices is eliminated
(declare-fun r () Bool) ;; students are competing with each other about their clothing choices
(declare-fun s () Bool) ;; peer pressure and bullying are a problem in schools
(declare-fun t () Bool) ;; peer pressure and bullying are reduced
(declare-fun u () Bool) ;; student safety is not increased
(define-fun premise1 () Bool (=> p q))
(define-fun premise2 () Bool (and r s))
(define-fun premise3 () Bool (=> t (not u)))
(define-fun conclusion4 () Bool (=> p (not u)))
(define-fun argument () Bool (=> (and premise1 premise2 premise3) conclusion4))
(assert (not argument))
(check-sat)
```

👤 **You**  
Thanks so much. I just pasted that into the Z3 playground, accordinhg to which the argument is NOT valid. Can you explain why, and suggest how to improve your reconstruction?

🤖 **Assistant**  
The argument's conclusion does not logically follow from its premises.

To fix this, we should revisit our initial analysis of the text.

👤 **You**  
But before going back to the text, can you suggest a modification of the argument (we've just checked) and which I repeat here for clarity:

```argdown
(1) If school uniforms are worn by all students, then competition among students over clothing choices is eliminated. {"formalization": "p -> q", "keys": {"p": "school uniforms are worn by all students", "q": "competition among students over clothing choices is eliminated"}}
(2) Students are competing with each other about their clothing choices and peer pressure and bullying are a problem in schools. {"formalization": "r & s", "keys": {"r": "students are competing with each other about their clothing choices", "s": "peer pressure and bullying are a problem in schools"}}
(3) If peer pressure and bullying are reduced, then student safety is not increased. {"formalization": "t -> not u", "keys": {"t": "peer pressure and bullying are reduced", "u": "student safety is not increased"}}
-----
(4) If school uniforms are worn by all students, then student safety is not increased. {"formalization": "p -> not u", "keys": {"p": "school uniforms are worn by all students", "t": "peer pressure and bullying are reduced", "u": "student safety is not increased"}, "follows_from": [1, 2, 3]}
```
What could we change here to render the argument valid? Just point out potential modifications.


🤖 **Assistant**  
To make this an inferentially valid argument, I would suggest revising it as follows:

Replace (4) with: not p.
Here's the revised argument:

```argdown
(1) If school uniforms are worn by all students, then competition among students over clothing choices is eliminated. {"formalization": "p -> q", "keys": {}}
(2) Students are competing with each other about their clothing choices and peer pressure and bullying are a problem in schools. {"formalization": "r & s", "keys": {}}
(3) If peer pressure and bullying are reduced, then student safety is not increased. {"formalization": "t -> not u", "keys": {}}
-----
(4) School uniforms are not worn by all students. {"formalization": "not p", "keys": {}, "follows_from": [1, 2, 3]}
```



## Citations

Cite SPIN as:

```bibtex
@misc{chen2024selfplayfinetuningconvertsweak,
      title={Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models}, 
      author={Zixiang Chen and Yihe Deng and Huizhuo Yuan and Kaixuan Ji and Quanquan Gu},
      year={2024},
      eprint={2401.01335},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2401.01335}, 
}
```

Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallouédec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```