Update README.md
Browse files
README.md
CHANGED
@@ -21,27 +21,59 @@ model-index:
|
|
21 |
verified: false
|
22 |
---
|
23 |
|
24 |
-
# SAC Agent for FetchPickAndPlace-v4
|
25 |
|
26 |
-
## Model
|
27 |
|
28 |
-
This repository contains a Soft Actor-Critic (SAC) agent trained on the `FetchPickAndPlace-v4` environment
|
29 |
|
30 |
- **Algorithm:** Soft Actor-Critic (SAC)
|
31 |
- **Replay Buffer:** Hindsight Experience Replay (HER)
|
32 |
-
- **Environment:** FetchPickAndPlace-v4 (
|
33 |
- **Framework:** Stable Baselines3
|
34 |
|
35 |
## Training Details
|
36 |
|
37 |
- **Total Timesteps:** 500,000
|
38 |
-
- **
|
39 |
-
- **
|
40 |
- **Seed:** 42
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
41 |
|
42 |
## Usage
|
43 |
|
44 |
-
To load and use the model:
|
45 |
|
46 |
```python
|
47 |
from stable_baselines3 import SAC
|
@@ -49,7 +81,7 @@ import gymnasium as gym
|
|
49 |
import gymnasium_robotics
|
50 |
|
51 |
env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
|
52 |
-
model = SAC.load("path/to/
|
53 |
|
54 |
obs, info = env.reset()
|
55 |
done = False
|
@@ -59,23 +91,86 @@ while not done:
|
|
59 |
env.render()
|
60 |
```
|
61 |
|
62 |
-
## Evaluation
|
63 |
|
64 |
-
|
65 |
|
66 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
67 |
|
68 |
-
|
69 |
-
|
70 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
71 |
|
72 |
## Citation
|
73 |
|
74 |
If you use this model, please cite:
|
75 |
|
76 |
```
|
77 |
-
@misc{
|
78 |
-
title={SAC Agent for FetchPickAndPlace-v4},
|
79 |
author={IntelliGrow},
|
80 |
year={2025},
|
81 |
howpublished={Hugging Face Hub},
|
@@ -88,3 +183,5 @@ If you use this model, please cite:
|
|
88 |
MIT License
|
89 |
|
90 |
---
|
|
|
|
|
|
21 |
verified: false
|
22 |
---
|
23 |
|
24 |
+
# SAC + HER Agent for FetchPickAndPlace-v4
|
25 |
|
26 |
+
## Model Overview
|
27 |
|
28 |
+
This repository contains a Soft Actor-Critic (SAC) agent trained with Hindsight Experience Replay (HER) on the `FetchPickAndPlace-v4` environment from `gymnasium-robotics`. The agent learns to pick and place objects using sparse or dense rewards, and is suitable for robotic manipulation research.
|
29 |
|
30 |
- **Algorithm:** Soft Actor-Critic (SAC)
|
31 |
- **Replay Buffer:** Hindsight Experience Replay (HER)
|
32 |
+
- **Environment:** FetchPickAndPlace-v4 (`gymnasium-robotics`)
|
33 |
- **Framework:** Stable Baselines3
|
34 |
|
35 |
## Training Details
|
36 |
|
37 |
- **Total Timesteps:** 500,000
|
38 |
+
- **Evaluation Frequency:** Every 2,000 steps (15 episodes per eval)
|
39 |
+
- **Checkpoint Frequency:** Every 50,000 steps (model + replay buffer)
|
40 |
- **Seed:** 42
|
41 |
+
- **Dense Shaping:** `False` (can be enabled with wrapper)
|
42 |
+
- **Device:** CUDA if available, otherwise auto
|
43 |
+
|
44 |
+
### Hyperparameters
|
45 |
+
|
46 |
+
| Parameter | Value |
|
47 |
+
|--------------------------|----------------------|
|
48 |
+
| Algorithm | SAC |
|
49 |
+
| Policy | MultiInputPolicy |
|
50 |
+
| Replay Buffer | HER |
|
51 |
+
| n_sampled_goal | 4 |
|
52 |
+
| goal_selection_strategy | future |
|
53 |
+
| Batch Size | 512 |
|
54 |
+
| Buffer Size | 1,000,000 |
|
55 |
+
| Learning Rate | 1e-3 |
|
56 |
+
| Gamma | 0.95 |
|
57 |
+
| Tau | 0.05 |
|
58 |
+
| Entropy Coefficient | auto |
|
59 |
+
| Train Frequency | 1 step |
|
60 |
+
| Gradient Steps | 1 |
|
61 |
+
| Tensorboard Log | logs_pnp_sac_her/tb |
|
62 |
+
| Seed | 42 |
|
63 |
+
| Device | CUDA/Auto |
|
64 |
+
| Dense Shaping | False (default) |
|
65 |
+
|
66 |
+
## Files
|
67 |
+
|
68 |
+
- `sac_her_pnp.zip`: Final trained SAC model
|
69 |
+
- `ckpt_sac_her_250000_steps.zip`: Latest checkpoint
|
70 |
+
- `replay_buffer.pkl`: Replay buffer for continued training
|
71 |
+
- `replay.mp4`: Replay video of agent performance (manual generation recommended)
|
72 |
+
- `README.md`: This model card
|
73 |
|
74 |
## Usage
|
75 |
|
76 |
+
To load and use the model for inference:
|
77 |
|
78 |
```python
|
79 |
from stable_baselines3 import SAC
|
|
|
81 |
import gymnasium_robotics
|
82 |
|
83 |
env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
|
84 |
+
model = SAC.load("path/to/sac_her_pnp.zip", env=env)
|
85 |
|
86 |
obs, info = env.reset()
|
87 |
done = False
|
|
|
91 |
env.render()
|
92 |
```
|
93 |
|
94 |
+
## Evaluation
|
95 |
|
96 |
+
To evaluate the agent over multiple episodes:
|
97 |
|
98 |
+
```python
|
99 |
+
from stable_baselines3 import SAC
|
100 |
+
import gymnasium as gym
|
101 |
+
import gymnasium_robotics
|
102 |
+
|
103 |
+
env = gym.make("FetchPickAndPlace-v4", render_mode="human")
|
104 |
+
model = SAC.load("path/to/sac_her_pnp.zip", env=env)
|
105 |
+
|
106 |
+
num_episodes = 10
|
107 |
+
for ep in range(num_episodes):
|
108 |
+
obs, info = env.reset()
|
109 |
+
done = False
|
110 |
+
truncated = False
|
111 |
+
episode_reward = 0
|
112 |
+
while not (done or truncated):
|
113 |
+
action, _ = model.predict(obs, deterministic=True)
|
114 |
+
obs, reward, done, truncated, info = env.step(action)
|
115 |
+
env.render()
|
116 |
+
episode_reward += reward
|
117 |
+
print(f"Episode {ep+1} reward: {episode_reward}")
|
118 |
+
env.close()
|
119 |
+
```
|
120 |
+
|
121 |
+
## Replay Video
|
122 |
+
|
123 |
+
If `replay.mp4` is not present, you can manually generate it:
|
124 |
+
|
125 |
+
```python
|
126 |
+
import gymnasium as gym
|
127 |
+
import gymnasium_robotics
|
128 |
+
from stable_baselines3 import SAC
|
129 |
+
import moviepy.editor as mpy
|
130 |
+
|
131 |
+
env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
|
132 |
+
model = SAC.load("path/to/sac_her_pnp.zip", env=env)
|
133 |
+
|
134 |
+
frames = []
|
135 |
+
obs, info = env.reset()
|
136 |
+
done = False
|
137 |
+
truncated = False
|
138 |
+
step = 0
|
139 |
+
max_steps = 1000
|
140 |
+
|
141 |
+
while not (done or truncated) and step < max_steps:
|
142 |
+
frame = env.render()
|
143 |
+
frames.append(frame)
|
144 |
+
action, _ = model.predict(obs, deterministic=True)
|
145 |
+
obs, reward, done, truncated, info = env.step(action)
|
146 |
+
step += 1
|
147 |
+
|
148 |
+
env.close()
|
149 |
+
clip = mpy.ImageSequenceClip(frames, fps=30)
|
150 |
+
clip.write_videofile("replay.mp4", codec="libx264")
|
151 |
+
```
|
152 |
|
153 |
+
## Continued Training
|
154 |
+
|
155 |
+
To continue training from a checkpoint:
|
156 |
+
|
157 |
+
```python
|
158 |
+
from stable_baselines3 import SAC
|
159 |
+
import gymnasium as gym
|
160 |
+
import gymnasium_robotics
|
161 |
+
|
162 |
+
env = gym.make("FetchPickAndPlace-v4", render_mode=None)
|
163 |
+
model = SAC.load("logs_pnp_sac_her/ckpt_sac_her_250000_steps.zip", env=env)
|
164 |
+
model.learn(total_timesteps=500_000, reset_num_timesteps=False)
|
165 |
+
```
|
166 |
|
167 |
## Citation
|
168 |
|
169 |
If you use this model, please cite:
|
170 |
|
171 |
```
|
172 |
+
@misc{IntelliGrow_FetchPickAndPlace_SAC_HER,
|
173 |
+
title={SAC + HER Agent for FetchPickAndPlace-v4},
|
174 |
author={IntelliGrow},
|
175 |
year={2025},
|
176 |
howpublished={Hugging Face Hub},
|
|
|
183 |
MIT License
|
184 |
|
185 |
---
|
186 |
+
|
187 |
+
**Contact:** For questions or issues, open an issue on the [Hugging Face repository](https://huggingface.co/IntelliGrow/FetchPickAndPlace-v4).
|