sighmon
/

a2c-PandaReachDense-v3

Reinforcement Learning

stable-baselines3

PandaReachDense-v3

deep-reinforcement-learning

Model card Files Files and versions Community

sighmon commited on Feb 15

Commit

ea37acb

·

verified ·

1 Parent(s): 30b8bf0

Update README.md

Files changed (1) hide show

README.md +73 -5

README.md CHANGED Viewed

@@ -26,12 +26,80 @@ This is a trained model of a **A2C** agent playing **PandaReachDense-v3**
 using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
 ## Usage (with Stable-baselines3)
-TODO: Add your code
 ```python
-from stable_baselines3 import ...
-from huggingface_sb3 import load_from_hub
-...
 ```

 using the [stable-baselines3 library](https://github.com/DLR-RM/stable-baselines3).
 ## Usage (with Stable-baselines3)
 ```python
+import os
+import gymnasium as gym
+import panda_gym
+from huggingface_sb3 import load_from_hub, package_to_hub
+from stable_baselines3 import A2C
+from stable_baselines3.common.evaluation import evaluate_policy
+from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
+from stable_baselines3.common.env_util import make_vec_env
+from huggingface_hub import notebook_login
+env_id = "PandaReachDense-v3"
+# Create the env
+env = gym.make(env_id)
+# Get the state space and action space
+s_size = env.observation_space.shape
+a_size = env.action_space
+env = make_vec_env(env_id, n_envs=4)
+# Adding this wrapper to normalize the observation and the reward
+env = VecNormalize(env, norm_obs=True, norm_reward=True, clip_obs=10.)
+model = A2C(
+    policy = "MultiInputPolicy",
+    env = env,
+    verbose=1,
+)
+# Train
+model.learn(1_000_000)
+# Save the model and  VecNormalize statistics when saving the agent
+model.save("a2c-PandaReachDense-v3")
+env.save("vec_normalize.pkl")
+from stable_baselines3.common.vec_env import DummyVecEnv, VecNormalize
+# Load the saved statistics
+eval_env = DummyVecEnv([lambda: gym.make("PandaReachDense-v3")])
+eval_env = VecNormalize.load("vec_normalize.pkl", eval_env)
+# We need to override the render_mode
+eval_env.render_mode = "rgb_array"
+#  do not update them at test time
+eval_env.training = False
+# reward normalization is not needed at test time
+eval_env.norm_reward = False
+# Load the agent
+model = A2C.load("a2c-PandaReachDense-v3")
+mean_reward, std_reward = evaluate_policy(model, eval_env)
+print(f"Mean reward = {mean_reward:.2f} +/- {std_reward:.2f}")
+from huggingface_sb3 import package_to_hub
+package_to_hub(
+    model=model,
+    model_name=f"a2c-{env_id}",
+    model_architecture="A2C",
+    env_id=env_id,
+    eval_env=eval_env,
+    repo_id=f"sighmon/a2c-{env_id}",
+    commit_message="With working video",
+)
 ```