steventrouble commited on
Commit
5e04531
·
1 Parent(s): 8c8a945

Update README.md

Browse files

Update README with project info

Files changed (1) hide show
  1. README.md +45 -0
README.md CHANGED
@@ -1,3 +1,48 @@
1
  ---
2
  license: openrail
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: openrail
3
+ pipeline_tag: reinforcement-learning
4
  ---
5
+
6
+ # EfficientZero Remastered
7
+
8
+ This repo contains the pre-trained models for the EfficientZero Remastered
9
+ project from Gigglebit Studios, a project to stabilize the training process
10
+ for the state of the art EfficientZero model.
11
+
12
+ * [Training source code](https://github.com/steventrouble/EfficientZero)
13
+ * [About the project](https://www.gigglebit.net/blog/efficientzero.html)
14
+ * [About EfficientZero](https://arxiv.org/abs/2111.00210)
15
+ * [About Gigglebit](https://www.gigglebit.net/)
16
+
17
+ Huge thanks to [Stability AI](https://stability.ai/) for providing the compute
18
+ for this project!
19
+
20
+ ---
21
+
22
+ ## How to use these files
23
+
24
+ Download the model that you want to test, then run test.py to test the model.
25
+
26
+ _Note: We've only productionized the training process. If you want to use these
27
+ for inference in production, you'll need to write your own inference logic.
28
+ If you do, send us a PR and we'll add it to the repo!_
29
+
30
+ Files are labeled as follows:
31
+
32
+ ```
33
+ {gym_env}-s{seed}-e{env_steps}-t{train_steps}
34
+ ```
35
+
36
+ Where:
37
+ * `gym_env`: The string ID of the gym environment this model was trained on.
38
+ E.g. Breakout-v5
39
+ * `seed`: The seed that was used to train this model. Usually 0.
40
+ * `env_steps`: The total number of steps in the environment that this model
41
+ observed, usually 100k.
42
+ * `train_steps`: The total number of training epochs the model underwent.
43
+
44
+ Note that `env_steps` can differ from `train_steps` because the model can
45
+ continue fine-tuning using its replay buffer. In the paper, the last 20k
46
+ epochs are done in this manner. This isn't necessary outside of benchmarks
47
+ and in theory better performance should be attainable by getting more samples
48
+ from the env.