Update README.md

04bb1bf verified 26 days ago

5.35 kB

	---
	library_name: stable-baselines3
	tags:
	- FetchPickAndPlace-v4
	- deep-reinforcement-learning
	- reinforcement-learning
	- stable-baselines3
	model-index:
	- name: SAC
	results:
	- task:
	type: reinforcement-learning
	name: reinforcement-learning
	dataset:
	name: FetchPickAndPlace-v4
	type: FetchPickAndPlace-v4
	metrics:
	- type: mean_reward
	value: -9.70 +/- 4.17
	name: mean_reward
	verified: false
	---

	# SAC + HER Agent for FetchPickAndPlace-v4

	## Model Overview

	This repository contains a Soft Actor-Critic (SAC) agent trained with Hindsight Experience Replay (HER) on the `FetchPickAndPlace-v4` environment from `gymnasium-robotics`. The agent learns to pick and place objects using sparse or dense rewards, and is suitable for robotic manipulation research.

	- Algorithm: Soft Actor-Critic (SAC)
	- Replay Buffer: Hindsight Experience Replay (HER)
	- Environment: FetchPickAndPlace-v4 (`gymnasium-robotics`)
	- Framework: Stable Baselines3

	## Training Details

	- Total Timesteps: 500,000
	- Evaluation Frequency: Every 2,000 steps (15 episodes per eval)
	- Checkpoint Frequency: Every 50,000 steps (model + replay buffer)
	- Seed: 42
	- Dense Shaping: `False` (can be enabled with wrapper)
	- Device: CUDA if available, otherwise auto

	### Hyperparameters

	\| Parameter \| Value \|
	\|--------------------------\|----------------------\|
	\| Algorithm \| SAC \|
	\| Policy \| MultiInputPolicy \|
	\| Replay Buffer \| HER \|
	\| n_sampled_goal \| 4 \|
	\| goal_selection_strategy \| future \|
	\| Batch Size \| 512 \|
	\| Buffer Size \| 1,000,000 \|
	\| Learning Rate \| 1e-3 \|
	\| Gamma \| 0.95 \|
	\| Tau \| 0.05 \|
	\| Entropy Coefficient \| auto \|
	\| Train Frequency \| 1 step \|
	\| Gradient Steps \| 1 \|
	\| Tensorboard Log \| logs_pnp_sac_her/tb \|
	\| Seed \| 42 \|
	\| Device \| CUDA/Auto \|
	\| Dense Shaping \| False (default) \|

	## Files

	- `sac_her_pnp.zip`: Final trained SAC model
	- `ckpt_sac_her_250000_steps.zip`: Latest checkpoint
	- `replay_buffer.pkl`: Replay buffer for continued training
	- `replay.mp4`: Replay video of agent performance (manual generation recommended)
	- `README.md`: This model card

	## Usage

	To load and use the model for inference:

	```python
	from stable_baselines3 import SAC
	import gymnasium as gym
	import gymnasium_robotics

	env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
	model = SAC.load("path/to/sac_her_pnp.zip", env=env)

	obs, info = env.reset()
	done = False
	while not done:
	action, _ = model.predict(obs, deterministic=True)
	obs, reward, done, truncated, info = env.step(action)
	env.render()
	```

	## Evaluation

	To evaluate the agent over multiple episodes:

	```python
	from stable_baselines3 import SAC
	import gymnasium as gym
	import gymnasium_robotics

	env = gym.make("FetchPickAndPlace-v4", render_mode="human")
	model = SAC.load("path/to/sac_her_pnp.zip", env=env)

	num_episodes = 10
	for ep in range(num_episodes):
	obs, info = env.reset()
	done = False
	truncated = False
	episode_reward = 0
	while not (done or truncated):
	action, _ = model.predict(obs, deterministic=True)
	obs, reward, done, truncated, info = env.step(action)
	env.render()
	episode_reward += reward
	print(f"Episode {ep+1} reward: {episode_reward}")
	env.close()
	```

	## Replay Video

	If `replay.mp4` is not present, you can manually generate it:

	```python
	import gymnasium as gym
	import gymnasium_robotics
	from stable_baselines3 import SAC
	import moviepy.editor as mpy

	env = gym.make("FetchPickAndPlace-v4", render_mode="rgb_array")
	model = SAC.load("path/to/sac_her_pnp.zip", env=env)

	frames = []
	obs, info = env.reset()
	done = False
	truncated = False
	step = 0
	max_steps = 1000

	while not (done or truncated) and step < max_steps:
	frame = env.render()
	frames.append(frame)
	action, _ = model.predict(obs, deterministic=True)
	obs, reward, done, truncated, info = env.step(action)
	step += 1

	env.close()
	clip = mpy.ImageSequenceClip(frames, fps=30)
	clip.write_videofile("replay.mp4", codec="libx264")
	```

	## Continued Training

	To continue training from a checkpoint:

	```python
	from stable_baselines3 import SAC
	import gymnasium as gym
	import gymnasium_robotics

	env = gym.make("FetchPickAndPlace-v4", render_mode=None)
	model = SAC.load("logs_pnp_sac_her/ckpt_sac_her_250000_steps.zip", env=env)
	model.learn(total_timesteps=500_000, reset_num_timesteps=False)
	```

	## Citation

	If you use this model, please cite:

	```
	@misc{IntelliGrow_FetchPickAndPlace_SAC_HER,
	title={SAC + HER Agent for FetchPickAndPlace-v4},
	author={IntelliGrow},
	year={2025},
	howpublished={Hugging Face Hub},
	url={https://huggingface.co/IntelliGrow/FetchPickAndPlace-v4}
	}
	```

	## License

	MIT License

	---

	Contact: For questions or issues, open an issue on the [Hugging Face repository](https://huggingface.co/IntelliGrow/FetchPickAndPlace-v4).