IntelliGrow's picture
Update README.md
04bb1bf verified
---
library_name: stable-baselines3
tags:
- FetchPickAndPlace-v4
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
model-index:
- name: SAC
results:
- task:
type: reinforcement-learning
name: reinforcement-learning
dataset:
name: FetchPickAndPlace-v4
type: FetchPickAndPlace-v4
metrics:
- type: mean_reward
value: -9.70 +/- 4.17
name: mean_reward
verified: false
---
# SAC + HER Agent for FetchPickAndPlace-v4
## Model Overview
This repository contains a Soft Actor-Critic (SAC) agent trained with Hindsight Experience Replay (HER) on the `FetchPickAndPlace-v4` environment from `gymnasium-robotics`. The agent learns to pick and place objects using sparse or dense rewards, and is suitable for robotic manipulation research.
- **Algorithm:** Soft Actor-Critic (SAC)
- **Replay Buffer:** Hindsight Experience Replay (HER)
- **Environment:** FetchPickAndPlace-v4 (`gymnasium-robotics`)
- **Framework:** Stable Baselines3
## Training Details
- **Total Timesteps:** 500,000
- **Evaluation Frequency:** Every 2,000 steps (15 episodes per eval)
- **Checkpoint Frequency:** Every 50,000 steps (model + replay buffer)
- **Seed:** 42
- **Dense Shaping:** `False` (can be enabled with wrapper)
- **Device:** CUDA if available, otherwise auto
### Hyperparameters
| Parameter | Value |
|--------------------------|----------------------|
| Algorithm | SAC |
| Policy | MultiInputPolicy |
| Replay Buffer | HER |
| n_sampled_goal | 4 |
| goal_selection_strategy | future |
| Batch Size | 512 |
| Buffer Size | 1,000,000 |
| Learning Rate | 1e-3 |
| Gamma | 0.95 |
| Tau | 0.05 |
| Entropy Coefficient | auto |
| Train Frequency | 1 step |
| Gradient Steps | 1 |
| Tensorboard Log | logs_pnp_sac_her/tb |
| Seed | 42 |
| Device | CUDA/Auto |
| Dense Shaping | False (default) |
## Files
- `sac_her_pnp.zip`: Final trained SAC model
- `ckpt_sac_her_250000_steps.zip`: Latest checkpoint
- `replay_buffer.pkl`: Replay buffer for continued training
- `replay.mp4`: Replay video of agent performance (manual generation recommended)
- `README.md`: This model card
## Usage
To load and use the model for inference:
```python
from stable_baselines3 import SAC
import gymnasium as gym
import gymnasium_robotics
env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
model = SAC.load("path/to/sac_her_pnp.zip", env=env)
obs, info = env.reset()
done = False
while not done:
action, _ = model.predict(obs, deterministic=True)
obs, reward, done, truncated, info = env.step(action)
env.render()
```
## Evaluation
To evaluate the agent over multiple episodes:
```python
from stable_baselines3 import SAC
import gymnasium as gym
import gymnasium_robotics
env = gym.make("FetchPickAndPlace-v4", render_mode="human")
model = SAC.load("path/to/sac_her_pnp.zip", env=env)
num_episodes = 10
for ep in range(num_episodes):
obs, info = env.reset()
done = False
truncated = False
episode_reward = 0
while not (done or truncated):
action, _ = model.predict(obs, deterministic=True)
obs, reward, done, truncated, info = env.step(action)
env.render()
episode_reward += reward
print(f"Episode {ep+1} reward: {episode_reward}")
env.close()
```
## Replay Video
If `replay.mp4` is not present, you can manually generate it:
```python
import gymnasium as gym
import gymnasium_robotics
from stable_baselines3 import SAC
import moviepy.editor as mpy
env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
model = SAC.load("path/to/sac_her_pnp.zip", env=env)
frames = []
obs, info = env.reset()
done = False
truncated = False
step = 0
max_steps = 1000
while not (done or truncated) and step < max_steps:
frame = env.render()
frames.append(frame)
action, _ = model.predict(obs, deterministic=True)
obs, reward, done, truncated, info = env.step(action)
step += 1
env.close()
clip = mpy.ImageSequenceClip(frames, fps=30)
clip.write_videofile("replay.mp4", codec="libx264")
```
## Continued Training
To continue training from a checkpoint:
```python
from stable_baselines3 import SAC
import gymnasium as gym
import gymnasium_robotics
env = gym.make("FetchPickAndPlace-v4", render_mode=None)
model = SAC.load("logs_pnp_sac_her/ckpt_sac_her_250000_steps.zip", env=env)
model.learn(total_timesteps=500_000, reset_num_timesteps=False)
```
## Citation
If you use this model, please cite:
```
@misc{IntelliGrow_FetchPickAndPlace_SAC_HER,
title={SAC + HER Agent for FetchPickAndPlace-v4},
author={IntelliGrow},
year={2025},
howpublished={Hugging Face Hub},
url={https://huggingface.co/IntelliGrow/FetchPickAndPlace-v4}
}
```
## License
MIT License
---
**Contact:** For questions or issues, open an issue on the [Hugging Face repository](https://huggingface.co/IntelliGrow/FetchPickAndPlace-v4).