File size: 5,348 Bytes
4c5ccd9 04bb1bf 4c5ccd9 04bb1bf 4c5ccd9 04bb1bf 4c5ccd9 04bb1bf 4c5ccd9 04bb1bf 4c5ccd9 04bb1bf 4c5ccd9 04bb1bf 4c5ccd9 04bb1bf 4c5ccd9 04bb1bf 4c5ccd9 04bb1bf 4c5ccd9 04bb1bf 4c5ccd9 04bb1bf 4c5ccd9 04bb1bf 4c5ccd9 04bb1bf |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 |
---
library_name: stable-baselines3
tags:
- FetchPickAndPlace-v4
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
model-index:
- name: SAC
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: FetchPickAndPlace-v4
type: FetchPickAndPlace-v4
metrics:
- type: mean_reward
value: -9.70 +/- 4.17
name: mean_reward
verified: false
---
# SAC + HER Agent for FetchPickAndPlace-v4
## Model Overview
This repository contains a Soft Actor-Critic (SAC) agent trained with Hindsight Experience Replay (HER) on the `FetchPickAndPlace-v4` environment from `gymnasium-robotics`. The agent learns to pick and place objects using sparse or dense rewards, and is suitable for robotic manipulation research.
- **Algorithm:** Soft Actor-Critic (SAC)
- **Replay Buffer:** Hindsight Experience Replay (HER)
- **Environment:** FetchPickAndPlace-v4 (`gymnasium-robotics`)
- **Framework:** Stable Baselines3
## Training Details
- **Total Timesteps:** 500,000
- **Evaluation Frequency:** Every 2,000 steps (15 episodes per eval)
- **Checkpoint Frequency:** Every 50,000 steps (model + replay buffer)
- **Seed:** 42
- **Dense Shaping:** `False` (can be enabled with wrapper)
- **Device:** CUDA if available, otherwise auto
### Hyperparameters
| Parameter | Value |
|--------------------------|----------------------|
| Algorithm | SAC |
| Policy | MultiInputPolicy |
| Replay Buffer | HER |
| n_sampled_goal | 4 |
| goal_selection_strategy | future |
| Batch Size | 512 |
| Buffer Size | 1,000,000 |
| Learning Rate | 1e-3 |
| Gamma | 0.95 |
| Tau | 0.05 |
| Entropy Coefficient | auto |
| Train Frequency | 1 step |
| Gradient Steps | 1 |
| Tensorboard Log | logs_pnp_sac_her/tb |
| Seed | 42 |
| Device | CUDA/Auto |
| Dense Shaping | False (default) |
## Files
- `sac_her_pnp.zip`: Final trained SAC model
- `ckpt_sac_her_250000_steps.zip`: Latest checkpoint
- `replay_buffer.pkl`: Replay buffer for continued training
- `replay.mp4`: Replay video of agent performance (manual generation recommended)
- `README.md`: This model card
## Usage
To load and use the model for inference:
```python
from stable_baselines3 import SAC
import gymnasium as gym
import gymnasium_robotics
env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
model = SAC.load("path/to/sac_her_pnp.zip", env=env)
obs, info = env.reset()
done = False
while not done:
action, _ = model.predict(obs, deterministic=True)
obs, reward, done, truncated, info = env.step(action)
env.render()
```
## Evaluation
To evaluate the agent over multiple episodes:
```python
from stable_baselines3 import SAC
import gymnasium as gym
import gymnasium_robotics
env = gym.make("FetchPickAndPlace-v4", render_mode="human")
model = SAC.load("path/to/sac_her_pnp.zip", env=env)
num_episodes = 10
for ep in range(num_episodes):
obs, info = env.reset()
done = False
truncated = False
episode_reward = 0
while not (done or truncated):
action, _ = model.predict(obs, deterministic=True)
obs, reward, done, truncated, info = env.step(action)
env.render()
episode_reward += reward
print(f"Episode {ep+1} reward: {episode_reward}")
env.close()
```
## Replay Video
If `replay.mp4` is not present, you can manually generate it:
```python
import gymnasium as gym
import gymnasium_robotics
from stable_baselines3 import SAC
import moviepy.editor as mpy
env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
model = SAC.load("path/to/sac_her_pnp.zip", env=env)
frames = []
obs, info = env.reset()
done = False
truncated = False
step = 0
max_steps = 1000
while not (done or truncated) and step < max_steps:
frame = env.render()
frames.append(frame)
action, _ = model.predict(obs, deterministic=True)
obs, reward, done, truncated, info = env.step(action)
step += 1
env.close()
clip = mpy.ImageSequenceClip(frames, fps=30)
clip.write_videofile("replay.mp4", codec="libx264")
```
## Continued Training
To continue training from a checkpoint:
```python
from stable_baselines3 import SAC
import gymnasium as gym
import gymnasium_robotics
env = gym.make("FetchPickAndPlace-v4", render_mode=None)
model = SAC.load("logs_pnp_sac_her/ckpt_sac_her_250000_steps.zip", env=env)
model.learn(total_timesteps=500_000, reset_num_timesteps=False)
```
## Citation
If you use this model, please cite:
```
@misc{IntelliGrow_FetchPickAndPlace_SAC_HER,
title={SAC + HER Agent for FetchPickAndPlace-v4},
author={IntelliGrow},
year={2025},
howpublished={Hugging Face Hub},
url={https://huggingface.co/IntelliGrow/FetchPickAndPlace-v4}
}
```
## License
MIT License
---
**Contact:** For questions or issues, open an issue on the [Hugging Face repository](https://huggingface.co/IntelliGrow/FetchPickAndPlace-v4). |