Hugging Face
Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up
6
19
1
Haitham Bou Ammar
hba123
Follow
iiacobac's profile picture
RoyAalekh's profile picture
tejarampooniya's profile picture
13 followers
ยท
6 following
hbouammar
haitham-bou-ammar-a723a932
AI & ML interests
LLMs, VLMs, Robotics, Reinforcement Learning, Bayesian Optimisation
Recent Activity
authored
a paper
7 days ago
Almost Surely Safe Alignment of Large Language Models at Inference-Time
reacted
to
their
post
with ๐
8 days ago
We developed a method that ensures almost-sure safety (i.e., safety with probability approaching 1). We proved this result. We then, present a practical implementation which we call InferenceGuard. InferenceGuard has impressive practical results: 91.04% on Alpaca-7B and 100% safety results on Beaver 7B-v3. Now, it is easy to get high safety results like those if we want a dumb model, e.g., just don't answer or answer with EOS and so on. However, our goal is not to only have safe results, but also to make sure that the rewards are high - we want a good trade-off between safety and rewards! That's exactly, what we show. InferenceGuard achieves that! Check it out: https://huggingface.co/papers/2502.01208
reacted
to
their
post
with ๐ฅ
8 days ago
We developed a method that ensures almost-sure safety (i.e., safety with probability approaching 1). We proved this result. We then, present a practical implementation which we call InferenceGuard. InferenceGuard has impressive practical results: 91.04% on Alpaca-7B and 100% safety results on Beaver 7B-v3. Now, it is easy to get high safety results like those if we want a dumb model, e.g., just don't answer or answer with EOS and so on. However, our goal is not to only have safe results, but also to make sure that the rewards are high - we want a good trade-off between safety and rewards! That's exactly, what we show. InferenceGuard achieves that! Check it out: https://huggingface.co/papers/2502.01208
View all activity
Organizations
None yet
hba123
's activity
All
Models
Datasets
Spaces
Papers
Collections
Community
Posts
Upvotes
Likes
Articles
liked
a model
about 1 month ago
huawei-noah/MOASpec-Llama-3-8B-Instruct
Updated
Jan 7
โข
12
โข
5