ChineseSafe-Benchmark / changelog.md
Jerry0723's picture
feat: update 4 models and release a test set
582d3c2
|
raw
history blame
1.21 kB

CHANGELOG

2024-7-16

version: v1.0.0

changed:
- [1]feat: upload the first version

2024-10-26

version: v1.0.1

changed:
- [1]feat: add citation

2024-11-18

version: v1.0.2

changed:
- [1]feat: add three models: Qwen2.5-72B, Qwen2.5-32B, Qwen2-72B
- [2]feat: add subclass: Discrimination

2024-11-24

version: v1.0.3

changed:
- [1]feat: add three Qwen instruct models
- [2]feat: remove Qwen base models
- [3]feat: update some models' name

2024-12-28

version: v1.0.4

changed:
- [1]feat: update 9 models due to the December's todo-list:
    - QwQ-32B-Preview
    - Llama-3.1-70B-Instruct
    - Llama-3.3-70B-Instruct
    - Mistral-Nemo-Instruct-2407
    - Ministral-8B-Instruct-2410
    - Phi-3-small-8k-instruct
    - Phi-3-small-128k-instruct
    - Phi-3-medium-4k-instruct
    - Phi-3-medium-128k-instruct

2025-4-13

version: v1.0.5

changed:
- [1]feat: update 4 models due to the February's todo-list:
    - phi-4
    - DeepSeek-R1-Distill-Llama-70B
    - Mistral-Small-24B-Instruct-2501
    - Moonlight-16B-A3B-Instruct
- [2]feat: release a test set of 20000 samples