|
--- |
|
license: llama3.2 |
|
datasets: |
|
- jjzha/sefl |
|
language: |
|
- en |
|
base_model: |
|
- meta-llama/Llama-3.2-3B-Instruct |
|
pipeline_tag: text-generation |
|
tags: |
|
- educational |
|
- feedback |
|
--- |
|
|
|
# Model Card for Model ID |
|
|
|
This is a `meta-llama/Llama-3.2-3B-Instruct` model **fine-tuned on** the `jjzha/sefl` dataset using the **SEFL** approach (Synthetic Educational Feedback Loops). |
|
|
|
--- |
|
|
|
## Model Details |
|
|
|
### Model Description |
|
|
|
- **Developed by:** Mike Zhang |
|
- **Funded by [optional]:** Villum Fonden (VIL57392) |
|
- **Model type:** Autoregressive language model |
|
- **Language(s) (NLP):** English |
|
- **License:** cc-by-4.0 |
|
- **Finetuned from model [optional]:** meta-llama/Llama-3.2-3B-Instruct |
|
|
|
### Quick Summary (SEFL Approach) |
|
|
|
SEFL (\textbf{S}ynthetic \textbf{E}ducational \textbf{F}eedback \textbf{L}oops) is a framework designed to generate on-demand, concise, and targeted feedback for educational settings. Instead of relying on real-world student data—which often raises privacy and consent issues—SEFL simulates a teacher–student feedback loop using Large Language Models (LLMs). In particular: |
|
|
|
1. **Synthetic Data Generation** |
|
Two LLM "agents" (a Teacher-Agent and a Student-Agent) produce assignment and answer pairs. The Student-Agent introduces deliberate errors, and the Teacher-Agent provides specific, formative feedback on each error. |
|
|
|
2. **Fine-tuning on Synthetic Data** |
|
Smaller or mid-sized models (like Qwen2.5-14B-Instruct) are then fine-tuned on the teacher–student interaction data. This allows them to provide high-quality, contextually relevant, and concise feedback on new educational tasks. |
|
|
|
3. **Efficiency and Scalability** |
|
Because the data is fully synthetic, fine-tuning can be done at scale without the usual bottlenecks of data acquisition and anonymization. |
|
|
|
--- |
|
|
|
### Model Sources [optional] |
|
|
|
- **Repository:** [https://github.com/jjzha/sefl](https://github.com/jjzha/sefl) |
|
- **Paper [optional]:** _SEFL: Harnessing Large Language Model Agents to Improve Educational Feedback Systems (preprint)_ |
|
|
|
--- |
|
|
|
## Uses |
|
|
|
This model is intended to provide **high-quality, concise feedback** on educational assignments. By combining instruction tuning with a specialized SEFL dataset, it is designed to address common pitfalls in automated feedback systems (e.g., vagueness, excessive verbosity, lack of specificity). |
|
|
|
### Direct Use |
|
|
|
- **Formative Feedback:** Instructors or students can prompt the model with an assignment and a student response, and receive structured comments pinpointing strengths, weaknesses, and actionable improvement steps. |
|
- **Assignment Testing:** Course creators might use the model to generate feedback for sample student responses during test-design phases. |
|
|
|
### Downstream Use [optional] |
|
|
|
- **Integration into LMS:** (e.g., Moodle, Canvas) The model’s concise feedback approach can be embedded within an LMS for large-scale, automated or semi-automated feedback generation. |
|
- **Pedagogical Research:** Educational researchers can experiment with the model's feedback style to gauge student outcomes and assess the impact of immediate feedback loops. |
|
|
|
### Out-of-Scope Use |
|
|
|
- **Personalized Tutoring/Chat:** SEFL specifically focuses on single-turn or short feedback loops for tasks, rather than ongoing multi-turn or deeply personalized tutoring. |
|
- **Sensitive or High-Stakes Assessments:** This model should not be the **sole** determinant of success in high-stakes exams or certifications, as it does not guarantee error-free or unbiased feedback. |
|
|
|
--- |
|
|
|
## Bias, Risks, and Limitations |
|
|
|
### Known Limitations |
|
|
|
- **Synthetic Data Alignment:** The dataset is entirely synthetic. While this avoids privacy concerns, it may not capture the full diversity of real-world classroom submissions. |
|
- **Domain-Specific Depth:** If the assignment is too specialized or requires deep domain expertise, the model may provide incomplete or overly general feedback. |
|
- **Verbosity vs. Brevity:** LLMs can default to verbose explanations. While SEFL aims for concise feedback, some prompts or queries might still elicit lengthy responses. |
|
|
|
### Recommendations |
|
|
|
- **Human Oversight:** Educators should review automated feedback for correctness, especially for specialized or high-stakes tasks. |
|
- **Transparency:** Inform students that feedback is AI-generated and may not fully reflect instructor judgment. |
|
- **Refinement via Real Data:** Over time, augmenting synthetic data with real anonymized examples (if ethically collected) could improve domain coverage. |
|
|
|
--- |
|
|
|
## How to Get Started with the Model |
|
|
|
You can use the code below to get started: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM |
|
|
|
model_name = "jjzha/Llama-3.2-3B-Instruct-SEFL" |
|
tokenizer = AutoTokenizer.from_pretrained(model_name) |
|
model = AutoModelForCausalLM.from_pretrained(model_name) |
|
|
|
prompt = """<Insert assignment and student answer here>""" |
|
|
|
inputs = tokenizer(prompt, return_tensors="pt") |
|
|
|
outputs = model.generate(**inputs, max_length=512) |
|
|
|
response = tokenizer.decode(outputs[0], skip_special_tokens=True) |
|
|
|
print(response) |
|
``` |