steventrouble
/

EfficientZeroRemastered

Reinforcement Learning

Model card Files Files and versions Community

steventrouble commited on Jul 15, 2023

Commit

5e04531

·

1 Parent(s): 8c8a945

Update README.md

Update README with project info

Files changed (1) hide show

README.md +45 -0

README.md CHANGED Viewed

@@ -1,3 +1,48 @@
 ---
 license: openrail
 ---

 ---
 license: openrail
+pipeline_tag: reinforcement-learning
 ---
+# EfficientZero Remastered
+This repo contains the pre-trained models for the EfficientZero Remastered
+project from Gigglebit Studios, a project to stabilize the training process
+for the state of the art EfficientZero model.
+* [Training source code](https://github.com/steventrouble/EfficientZero)
+* [About the project](https://www.gigglebit.net/blog/efficientzero.html)
+* [About EfficientZero](https://arxiv.org/abs/2111.00210)
+* [About Gigglebit](https://www.gigglebit.net/)
+Huge thanks to [Stability AI](https://stability.ai/) for providing the compute
+for this project!
+---
+## How to use these files
+Download the model that you want to test, then run test.py to test the model.
+_Note: We've only productionized the training process. If you want to use these
+for inference in production, you'll need to write your own inference logic.
+If you do, send us a PR and we'll add it to the repo!_
+Files are labeled as follows:
+```
+{gym_env}-s{seed}-e{env_steps}-t{train_steps}
+```
+Where:
+*   `gym_env`: The string ID of the gym environment this model was trained on.
+    E.g. Breakout-v5
+*   `seed`: The seed that was used to train this model. Usually 0.
+*   `env_steps`: The total number of steps in the environment that this model
+    observed, usually 100k.
+*   `train_steps`: The total number of training epochs the model underwent.
+Note that `env_steps` can differ from `train_steps` because the model can
+continue fine-tuning using its replay buffer. In the paper, the last 20k
+epochs are done in this manner. This isn't necessary outside of benchmarks
+and in theory better performance should be attainable by getting more samples
+from the env.