We developed a method that ensures almost-sure safety (i.e., safety with probability approaching 1). We proved this result. We then, present a practical implementation which we call InferenceGuard. InferenceGuard has impressive practical results: 91.04% on Alpaca-7B and 100% safety results on Beaver 7B-v3.
Now, it is easy to get high safety results like those if we want a dumb model, e.g., just don't answer or answer with EOS and so on. However, our goal is not to only have safe results, but also to make sure that the rewards are high - we want a good trade-off between safety and rewards! That's exactly, what we show. InferenceGuard achieves that!
We developed a method that ensures almost-sure safety (i.e., safety with probability approaching 1). We proved this result. We then, present a practical implementation which we call InferenceGuard. InferenceGuard has impressive practical results: 91.04% on Alpaca-7B and 100% safety results on Beaver 7B-v3.
Now, it is easy to get high safety results like those if we want a dumb model, e.g., just don't answer or answer with EOS and so on. However, our goal is not to only have safe results, but also to make sure that the rewards are high - we want a good trade-off between safety and rewards! That's exactly, what we show. InferenceGuard achieves that!
I have some New Year presents for you, #MachineLearning and #AI community! We just opened our code for new state-of-the-art results that beat EAGLE-2 and Medusa #LLM inference.
I have some New Year presents for you, #MachineLearning and #AI community! We just opened our code for new state-of-the-art results that beat EAGLE-2 and Medusa #LLM inference.
Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this!
I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO.
Blindly applying algorithms without understanding the math behind them is not a good idea frmpv. So, I am on a quest to fix this!
I wrote my first hugging face article on how you would derive closed-form solutions for KL-regularised reinforcement learning problems - what is used for DPO.