Haitham Bou Ammar's picture

6 19 1

Haitham Bou Ammar

hba123

·

AI & ML interests

LLMs, VLMs, Robotics, Reinforcement Learning, Bayesian Optimisation

Recent Activity

authored a paper 7 days ago

Almost Surely Safe Alignment of Large Language Models at Inference-Time

reacted to their post with 😎 8 days ago

We developed a method that ensures almost-sure safety (i.e., safety with probability approaching 1). We proved this result. We then, present a practical implementation which we call InferenceGuard. InferenceGuard has impressive practical results: 91.04% on Alpaca-7B and 100% safety results on Beaver 7B-v3. Now, it is easy to get high safety results like those if we want a dumb model, e.g., just don't answer or answer with EOS and so on. However, our goal is not to only have safe results, but also to make sure that the rewards are high - we want a good trade-off between safety and rewards! That's exactly, what we show. InferenceGuard achieves that! Check it out: https://huggingface.co/papers/2502.01208

reacted to their post with 🔥 8 days ago

We developed a method that ensures almost-sure safety (i.e., safety with probability approaching 1). We proved this result. We then, present a practical implementation which we call InferenceGuard. InferenceGuard has impressive practical results: 91.04% on Alpaca-7B and 100% safety results on Beaver 7B-v3. Now, it is easy to get high safety results like those if we want a dumb model, e.g., just don't answer or answer with EOS and so on. However, our goal is not to only have safe results, but also to make sure that the rewards are high - we want a good trade-off between safety and rewards! That's exactly, what we show. InferenceGuard achieves that! Check it out: https://huggingface.co/papers/2502.01208

View all activity

Organizations

None yet

hba123's activity

liked a model about 1 month ago

huawei-noah/MOASpec-Llama-3-8B-Instruct

Updated Jan 7 • 12 • 5