File size: 5,348 Bytes
4c5ccd9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
04bb1bf
4c5ccd9
04bb1bf
4c5ccd9
04bb1bf
4c5ccd9
 
 
04bb1bf
4c5ccd9
 
 
 
 
04bb1bf
 
4c5ccd9
04bb1bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c5ccd9
 
 
04bb1bf
4c5ccd9
 
 
 
 
 
 
04bb1bf
4c5ccd9
 
 
 
 
 
 
 
 
04bb1bf
4c5ccd9
04bb1bf
4c5ccd9
04bb1bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4c5ccd9
04bb1bf
 
 
 
 
 
 
 
 
 
 
 
 
4c5ccd9
 
 
 
 
 
04bb1bf
 
4c5ccd9
 
 
 
 
 
 
 
 
 
 
 
04bb1bf
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
---
library_name: stable-baselines3
tags:
- FetchPickAndPlace-v4
- deep-reinforcement-learning
- reinforcement-learning
- stable-baselines3
model-index:
- name: SAC
  results:
  - task:
      type: reinforcement-learning
      name: reinforcement-learning
    dataset:
      name: FetchPickAndPlace-v4
      type: FetchPickAndPlace-v4
    metrics:
    - type: mean_reward
      value: -9.70 +/- 4.17
      name: mean_reward
      verified: false
---

# SAC + HER Agent for FetchPickAndPlace-v4

## Model Overview

This repository contains a Soft Actor-Critic (SAC) agent trained with Hindsight Experience Replay (HER) on the `FetchPickAndPlace-v4` environment from `gymnasium-robotics`. The agent learns to pick and place objects using sparse or dense rewards, and is suitable for robotic manipulation research.

- **Algorithm:** Soft Actor-Critic (SAC)
- **Replay Buffer:** Hindsight Experience Replay (HER)
- **Environment:** FetchPickAndPlace-v4 (`gymnasium-robotics`)
- **Framework:** Stable Baselines3

## Training Details

- **Total Timesteps:** 500,000
- **Evaluation Frequency:** Every 2,000 steps (15 episodes per eval)
- **Checkpoint Frequency:** Every 50,000 steps (model + replay buffer)
- **Seed:** 42
- **Dense Shaping:** `False` (can be enabled with wrapper)
- **Device:** CUDA if available, otherwise auto

### Hyperparameters

| Parameter                | Value                |
|--------------------------|----------------------|
| Algorithm                | SAC                  |
| Policy                   | MultiInputPolicy     |
| Replay Buffer            | HER                  |
| n_sampled_goal           | 4                    |
| goal_selection_strategy  | future               |
| Batch Size               | 512                  |
| Buffer Size              | 1,000,000            |
| Learning Rate            | 1e-3                 |
| Gamma                    | 0.95                 |
| Tau                      | 0.05                 |
| Entropy Coefficient      | auto                 |
| Train Frequency          | 1 step               |
| Gradient Steps           | 1                    |
| Tensorboard Log          | logs_pnp_sac_her/tb  |
| Seed                     | 42                   |
| Device                   | CUDA/Auto            |
| Dense Shaping            | False (default)      |

## Files

- `sac_her_pnp.zip`: Final trained SAC model
- `ckpt_sac_her_250000_steps.zip`: Latest checkpoint
- `replay_buffer.pkl`: Replay buffer for continued training
- `replay.mp4`: Replay video of agent performance (manual generation recommended)
- `README.md`: This model card

## Usage

To load and use the model for inference:

```python
from stable_baselines3 import SAC
import gymnasium as gym
import gymnasium_robotics

env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
model = SAC.load("path/to/sac_her_pnp.zip", env=env)

obs, info = env.reset()
done = False
while not done:
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)
    env.render()
```

## Evaluation

To evaluate the agent over multiple episodes:

```python
from stable_baselines3 import SAC
import gymnasium as gym
import gymnasium_robotics

env = gym.make("FetchPickAndPlace-v4", render_mode="human")
model = SAC.load("path/to/sac_her_pnp.zip", env=env)

num_episodes = 10
for ep in range(num_episodes):
    obs, info = env.reset()
    done = False
    truncated = False
    episode_reward = 0
    while not (done or truncated):
        action, _ = model.predict(obs, deterministic=True)
        obs, reward, done, truncated, info = env.step(action)
        env.render()
        episode_reward += reward
    print(f"Episode {ep+1} reward: {episode_reward}")
env.close()
```

## Replay Video

If `replay.mp4` is not present, you can manually generate it:

```python
import gymnasium as gym
import gymnasium_robotics
from stable_baselines3 import SAC
import moviepy.editor as mpy

env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
model = SAC.load("path/to/sac_her_pnp.zip", env=env)

frames = []
obs, info = env.reset()
done = False
truncated = False
step = 0
max_steps = 1000

while not (done or truncated) and step < max_steps:
    frame = env.render()
    frames.append(frame)
    action, _ = model.predict(obs, deterministic=True)
    obs, reward, done, truncated, info = env.step(action)
    step += 1

env.close()
clip = mpy.ImageSequenceClip(frames, fps=30)
clip.write_videofile("replay.mp4", codec="libx264")
```

## Continued Training

To continue training from a checkpoint:

```python
from stable_baselines3 import SAC
import gymnasium as gym
import gymnasium_robotics

env = gym.make("FetchPickAndPlace-v4", render_mode=None)
model = SAC.load("logs_pnp_sac_her/ckpt_sac_her_250000_steps.zip", env=env)
model.learn(total_timesteps=500_000, reset_num_timesteps=False)
```

## Citation

If you use this model, please cite:

```
@misc{IntelliGrow_FetchPickAndPlace_SAC_HER,
  title={SAC + HER Agent for FetchPickAndPlace-v4},
  author={IntelliGrow},
  year={2025},
  howpublished={Hugging Face Hub},
  url={https://huggingface.co/IntelliGrow/FetchPickAndPlace-v4}
}
```

## License

MIT License

---

**Contact:** For questions or issues, open an issue on the [Hugging Face repository](https://huggingface.co/IntelliGrow/FetchPickAndPlace-v4).