steventrouble
/

EfficientZeroRemastered

Reinforcement Learning

Model card Files Files and versions Community

EfficientZeroRemastered / README.md

steventrouble's picture

Update README.md

f2ab607 over 1 year ago

|

history blame contribute delete

3.15 kB

	---
	license: openrail
	pipeline_tag: reinforcement-learning
	---

	# EfficientZero Remastered

	This repo contains the pre-trained models for the EfficientZero Remastered
	project from Gigglebit Studios, a project to stabilize the training process
	for the state of the art EfficientZero model.

	* [Training source code](https://github.com/steventrouble/EfficientZero)
	* [About the project](https://www.gigglebit.net/blog/efficientzero.html)
	* [About EfficientZero](https://arxiv.org/abs/2111.00210)
	* [About Gigglebit](https://www.gigglebit.net/)

	Huge thanks to [Stability AI](https://stability.ai/) for providing the compute
	for this project!

	---

	## How to use these files

	Download the model that you want to test, then run test.py to test the model.

	_Note: We've only productionized the training process. If you want to use these
	for inference in production, you'll need to write your own inference logic.
	If you do, send us a PR and we'll add it to the repo!_

	Files are labeled as follows:

	```
	{gym_env}-s{seed}-e{env_steps}-t{train_steps}
	```

	Where:
	* `gym_env`: The string ID of the gym environment this model was trained on.
	E.g. Breakout-v5
	* `seed`: The seed that was used to train this model. Usually 0.
	* `env_steps`: The total number of steps in the environment that this model
	observed, usually 100k.
	* `train_steps`: The total number of training epochs the model underwent.

	Note that `env_steps` can differ from `train_steps` because the model can
	continue fine-tuning using its replay buffer. In the paper, the last 20k
	epochs are done in this manner. This isn't necessary outside of benchmarks
	and in theory better performance should be attainable by getting more samples
	from the env.

	---

	## Findings

	Our primary goal in this project was to test out EfficientZero and see its capabilities.
	We were amazed by the model overall, especially on Breakout, where it far outperformed
	the human baseline. The overall cost was only about $50 per fully trained model, compared
	to the hundreds of thousands of dollars needed to train MuZero.

	Though the trained models achieved impressive scores in Atari, they didn't reach the
	stellar scores demonstrated in the paper. This could be because we used different hardware
	and dependencies or because ML research papers tend to cherry-pick models and environments
	to showcase good results.

	Additionally, the models tended to hit a performance wall between 75-100k steps. While we
	don't have enough data to know why or how often this happens, it's not surprising: the model
	was tuned specifically for data efficiency, so it hasn't been tested at larger scales. A
	model like MuZero might be more appropriate if you have a large budget.

	Training times seemed longer than those reported in the EfficientZero paper. The paper
	stated that they could train a model to completion in 7 hours, while in practice, we've found
	that it takes an A100 with 32 cores between 1 to 2 days to train a model to completion. This
	is likely because the training process uses more CPU than other models and therefore does not
	perform well on the low-frequency, many-core CPUs found in GPU clusters.