Sharathhebbar24
/

ssh_1.8B

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

Sharathhebbar24/ssh_1.8B is a 1.8B model

The model is a modified version of qnguyen3/quan-1.8b-chat

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 4
total_train_batch_size: 32
total_eval_batch_size: 8
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_steps: 100
num_epochs: 4

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	45.91
AI2 Reasoning Challenge (25-Shot)	39.08
HellaSwag (10-Shot)	62.37
MMLU (5-Shot)	44.09
TruthfulQA (0-shot)	43.15
Winogrande (5-shot)	59.27
GSM8k (5-shot)	27.52

Downloads last month: 168

Safetensors

Model size

1.84B params

Tensor type

FP16

·

Inference Providers NEW

Text Generation

This model is not currently available via any of the supported Inference Providers.

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

39.080
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

62.370
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

44.090
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

43.150
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

59.270
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

27.520

View on Papers With Code