pcunwa
/

BS-Roformer-Inst-EXP-Value-Residual

Model card Files Files and versions Community

BS-Roformer-Inst-EXP-Value-Residual / README.md

pcunwa's picture

Create README.md

17b0f6d verified 3 months ago

|

history blame contribute delete

603 Bytes

	⚠Warning⚠ this is an experimental weight. It may not have practical performance.<br>
	Also, the model file must be manually rewritten or replaced to use this weight.<br>

	The model file is available here.<br>
	https://github.com/lucidrains/BS-RoFormer

	The BS-Roformer has been updated in terms of architecture for the first time in a while.<br>
	In the 0.5.x update, a mechanism called "Value Residual Learning" was introduced. (https://arxiv.org/abs/2410.17897)<br>
	The paper argues that this mechanism can reduce the over-focus of attention and further reduce the vanishing gradient problem.