avemio-digital commited on
Commit
da56574
verified
1 Parent(s): 59785ac

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -14
README.md CHANGED
@@ -116,20 +116,20 @@ For training data details, please see the [GRAG-SFT-Dataset](https://huggingface
116
  ### Architecture
117
 
118
 
119
- | | **GRAG-PHI-SFT** |
120
- |------------------------|-------------------|---------------------|--------------------|--------------------|------------------|
121
- | d_model | 3072 |
122
- | num heads | 32 |
123
- | num layers | 32 |
124
- | MLP ratio | 2.66 |
125
- | LayerNorm type | RMSNorm |
126
- | pos embeddings | RoPE |
127
- | attention variant | Standard Multi-Head Self Attention with sliding-window of 2047 |
128
- | biases | none |
129
- | block type | sequential |
130
- | activation | SiLU |
131
- | sequence length | 131072 | |
132
- | weight tying | bfloat16 |
133
 
134
  ### Hyperparameters
135
 
 
116
  ### Architecture
117
 
118
 
119
+ | Parameter | GRAG-PHI-SFT |
120
+ |-----------------------|-----------------------------------------------------------------------------------------------|
121
+ | **d_model** | 3072 |
122
+ | **num heads** | 32 |
123
+ | **num layers** | 32 |
124
+ | **MLP ratio** | 2.66 |
125
+ | **LayerNorm type** | RMSNorm |
126
+ | **pos embeddings** | RoPE |
127
+ | **attention variant**| Standard Multi-Head Self Attention with sliding-window of 2047 |
128
+ | **biases** | none |
129
+ | **block type** | sequential |
130
+ | **activation** | SiLU |
131
+ | **sequence length** | 131072 |
132
+ | **weight tying** | bfloat16
133
 
134
  ### Hyperparameters
135