IntelliGrow commited on
Commit
04bb1bf
·
verified ·
1 Parent(s): 4c5ccd9

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +113 -16
README.md CHANGED
@@ -21,27 +21,59 @@ model-index:
21
  verified: false
22
  ---
23
 
24
- # SAC Agent for FetchPickAndPlace-v4
25
 
26
- ## Model Description
27
 
28
- This repository contains a Soft Actor-Critic (SAC) agent trained on the `FetchPickAndPlace-v4` environment using Hindsight Experience Replay (HER). The agent learns to pick and place objects in a simulated robotic environment.
29
 
30
  - **Algorithm:** Soft Actor-Critic (SAC)
31
  - **Replay Buffer:** Hindsight Experience Replay (HER)
32
- - **Environment:** FetchPickAndPlace-v4 (from gymnasium-robotics)
33
  - **Framework:** Stable Baselines3
34
 
35
  ## Training Details
36
 
37
  - **Total Timesteps:** 500,000
38
- - **Dense Shaping:** Disabled
39
- - **Evaluation:** Success rate and mean reward measured every 2,000 steps
40
  - **Seed:** 42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
 
42
  ## Usage
43
 
44
- To load and use the model:
45
 
46
  ```python
47
  from stable_baselines3 import SAC
@@ -49,7 +81,7 @@ import gymnasium as gym
49
  import gymnasium_robotics
50
 
51
  env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
52
- model = SAC.load("path/to/sac-FetchPickAndPlace-v4.zip", env=env)
53
 
54
  obs, info = env.reset()
55
  done = False
@@ -59,23 +91,86 @@ while not done:
59
  env.render()
60
  ```
61
 
62
- ## Evaluation & Replay
63
 
64
- A replay video (`replay.mp4`) is included to visualize the agent's performance over two episodes.
65
 
66
- ## Files
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
67
 
68
- - `sac-FetchPickAndPlace-v4.zip`: Trained SAC model
69
- - `replay.mp4`: Agent replay video
70
- - `README.md`: Model card
 
 
 
 
 
 
 
 
 
 
71
 
72
  ## Citation
73
 
74
  If you use this model, please cite:
75
 
76
  ```
77
- @misc{IntelliGrow_FetchPickAndPlace_SAC,
78
- title={SAC Agent for FetchPickAndPlace-v4},
79
  author={IntelliGrow},
80
  year={2025},
81
  howpublished={Hugging Face Hub},
@@ -88,3 +183,5 @@ If you use this model, please cite:
88
  MIT License
89
 
90
  ---
 
 
 
21
  verified: false
22
  ---
23
 
24
+ # SAC + HER Agent for FetchPickAndPlace-v4
25
 
26
+ ## Model Overview
27
 
28
+ This repository contains a Soft Actor-Critic (SAC) agent trained with Hindsight Experience Replay (HER) on the `FetchPickAndPlace-v4` environment from `gymnasium-robotics`. The agent learns to pick and place objects using sparse or dense rewards, and is suitable for robotic manipulation research.
29
 
30
  - **Algorithm:** Soft Actor-Critic (SAC)
31
  - **Replay Buffer:** Hindsight Experience Replay (HER)
32
+ - **Environment:** FetchPickAndPlace-v4 (`gymnasium-robotics`)
33
  - **Framework:** Stable Baselines3
34
 
35
  ## Training Details
36
 
37
  - **Total Timesteps:** 500,000
38
+ - **Evaluation Frequency:** Every 2,000 steps (15 episodes per eval)
39
+ - **Checkpoint Frequency:** Every 50,000 steps (model + replay buffer)
40
  - **Seed:** 42
41
+ - **Dense Shaping:** `False` (can be enabled with wrapper)
42
+ - **Device:** CUDA if available, otherwise auto
43
+
44
+ ### Hyperparameters
45
+
46
+ | Parameter | Value |
47
+ |--------------------------|----------------------|
48
+ | Algorithm | SAC |
49
+ | Policy | MultiInputPolicy |
50
+ | Replay Buffer | HER |
51
+ | n_sampled_goal | 4 |
52
+ | goal_selection_strategy | future |
53
+ | Batch Size | 512 |
54
+ | Buffer Size | 1,000,000 |
55
+ | Learning Rate | 1e-3 |
56
+ | Gamma | 0.95 |
57
+ | Tau | 0.05 |
58
+ | Entropy Coefficient | auto |
59
+ | Train Frequency | 1 step |
60
+ | Gradient Steps | 1 |
61
+ | Tensorboard Log | logs_pnp_sac_her/tb |
62
+ | Seed | 42 |
63
+ | Device | CUDA/Auto |
64
+ | Dense Shaping | False (default) |
65
+
66
+ ## Files
67
+
68
+ - `sac_her_pnp.zip`: Final trained SAC model
69
+ - `ckpt_sac_her_250000_steps.zip`: Latest checkpoint
70
+ - `replay_buffer.pkl`: Replay buffer for continued training
71
+ - `replay.mp4`: Replay video of agent performance (manual generation recommended)
72
+ - `README.md`: This model card
73
 
74
  ## Usage
75
 
76
+ To load and use the model for inference:
77
 
78
  ```python
79
  from stable_baselines3 import SAC
 
81
  import gymnasium_robotics
82
 
83
  env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
84
+ model = SAC.load("path/to/sac_her_pnp.zip", env=env)
85
 
86
  obs, info = env.reset()
87
  done = False
 
91
  env.render()
92
  ```
93
 
94
+ ## Evaluation
95
 
96
+ To evaluate the agent over multiple episodes:
97
 
98
+ ```python
99
+ from stable_baselines3 import SAC
100
+ import gymnasium as gym
101
+ import gymnasium_robotics
102
+
103
+ env = gym.make("FetchPickAndPlace-v4", render_mode="human")
104
+ model = SAC.load("path/to/sac_her_pnp.zip", env=env)
105
+
106
+ num_episodes = 10
107
+ for ep in range(num_episodes):
108
+ obs, info = env.reset()
109
+ done = False
110
+ truncated = False
111
+ episode_reward = 0
112
+ while not (done or truncated):
113
+ action, _ = model.predict(obs, deterministic=True)
114
+ obs, reward, done, truncated, info = env.step(action)
115
+ env.render()
116
+ episode_reward += reward
117
+ print(f"Episode {ep+1} reward: {episode_reward}")
118
+ env.close()
119
+ ```
120
+
121
+ ## Replay Video
122
+
123
+ If `replay.mp4` is not present, you can manually generate it:
124
+
125
+ ```python
126
+ import gymnasium as gym
127
+ import gymnasium_robotics
128
+ from stable_baselines3 import SAC
129
+ import moviepy.editor as mpy
130
+
131
+ env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
132
+ model = SAC.load("path/to/sac_her_pnp.zip", env=env)
133
+
134
+ frames = []
135
+ obs, info = env.reset()
136
+ done = False
137
+ truncated = False
138
+ step = 0
139
+ max_steps = 1000
140
+
141
+ while not (done or truncated) and step < max_steps:
142
+ frame = env.render()
143
+ frames.append(frame)
144
+ action, _ = model.predict(obs, deterministic=True)
145
+ obs, reward, done, truncated, info = env.step(action)
146
+ step += 1
147
+
148
+ env.close()
149
+ clip = mpy.ImageSequenceClip(frames, fps=30)
150
+ clip.write_videofile("replay.mp4", codec="libx264")
151
+ ```
152
 
153
+ ## Continued Training
154
+
155
+ To continue training from a checkpoint:
156
+
157
+ ```python
158
+ from stable_baselines3 import SAC
159
+ import gymnasium as gym
160
+ import gymnasium_robotics
161
+
162
+ env = gym.make("FetchPickAndPlace-v4", render_mode=None)
163
+ model = SAC.load("logs_pnp_sac_her/ckpt_sac_her_250000_steps.zip", env=env)
164
+ model.learn(total_timesteps=500_000, reset_num_timesteps=False)
165
+ ```
166
 
167
  ## Citation
168
 
169
  If you use this model, please cite:
170
 
171
  ```
172
+ @misc{IntelliGrow_FetchPickAndPlace_SAC_HER,
173
+ title={SAC + HER Agent for FetchPickAndPlace-v4},
174
  author={IntelliGrow},
175
  year={2025},
176
  howpublished={Hugging Face Hub},
 
183
  MIT License
184
 
185
  ---
186
+
187
+ **Contact:** For questions or issues, open an issue on the [Hugging Face repository](https://huggingface.co/IntelliGrow/FetchPickAndPlace-v4).