sureheremarv (Sure Here, Marv)

mattmdjaga

posted an update 9 days ago

Post

2200

🚨 Gray Swan AI's Biggest AI Jailbreaking Arena Yet! $130K+ 🚨

🔹 Agent Red-Teaming Challenge – test direct & indirect attacks on anonymous frontier models!
🔹 $130K+ in prizes & giveaways – co-sponsored by OpenAI & supported by UK AI Security Institute 🇬🇧
🔹 March 8 – April 6 – fresh exploits = fresh rewards!

How It Works:
✅ Anonymous models from top providers 🤐
✅ Direct & indirect prompt injection paths 🔄
✅ Weekly challenges for new behaviors 🗓️
✅ Speed & quantity-based rewards ⏩💰

Why Join?
⚖️ Neutral judging – UK AISI & automated judges ensure fairness
🎯 No pre-trained defenses – a true red-teaming battlefield
💻 5 Apple laptops up for grabs – increase chances by inviting friends!

🔗 Arena: app.grayswan.ai/arena/challenge/agent-red-teaming
🔗 Discord: discord.gg/grayswanai

🔥 No illusions, no mercy. Push AI agents to the limit & claim your share of $130K+! 🚀

eliotj

authored a paper 4 months ago

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models

Paper • 2408.08926 • Published Aug 15, 2024 • 6

mattmdjaga

authored 2 papers 4 months ago

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Paper • 2410.09024 • Published Oct 11, 2024 • 1

Applying Refusal-Vector Ablation to Llama 3.1 70B Agents

Paper • 2410.10871 • Published Oct 8, 2024 • 1

mattmdjaga

posted an update 5 months ago

Post

2048

🚨 New Agent Benchmark 🚨
AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

ai-safety-institute/AgentHarm

Collaboration between UK AI Safety Institute and Gray Swan AI to create a dataset for measuring harmfulness of LLM agents.

The benchmark contains both harmful and benign sets of 11 categories with varied difficulty levels and detailed evaluation, not only testing success rate but also tool level accuracy.

We provide refusal and accuracy metrics across a wide range of models in both no attack and prompt attack scenarios.

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents (2410.09024)

mattmdjaga

posted an update 7 months ago

Post

2555

$40K in Bounties: Ultimate Jailbreaking Championship 2024

🚨Ultimate Jailbreaking Championship 2024 🚨
Hackers vs. AI in the arena. Let the battle begin!
🏆 $40,000 in Bounties
🗓️ Sept 7, 2024 @ 10AM PDT
🔗Register Now: https://app.grayswan.ai/arena
====

Can you push an aligned language model to generate a bomb recipe or a fake news article? Join fellow hackers in a jailbreaking arena where you can test the boundaries of advanced LLMs.

====

The Objective
Your goal is to jailbreak as many LLMs as possible, as quickly as possible in the arena!

====

The Stakes
Break a model and claim your share of the $40,000 in bounties! With various jailbreak bounties and top hacker rewards, there are plenty of opportunities to win. Winners will also receive priority consideration for employment and internship opportunities at Gray Swan AI.

====

Ready to rise to the challenge? Join us and show the world what you can do!

See you in the arena!

1 reply

·

mattmdjaga

posted an update 10 months ago

Post

1816

NEW HF 🤗 COURSE to help people dive into Computer Vision built by the HF community. Over the last 6 months the hugging face discord community has been hard at work developing a new computer vision course. Receive a Certificate of completion and share it on your socials 🤗.

https://huggingface.co/learn/computer-vision-course/unit0/welcome/welcome

1 reply

·

Sure Here, Marv

AI & ML interests

sureheremarv's activity

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risk of Language Models

AgentHarm: A Benchmark for Measuring Harmfulness of LLM Agents

Applying Refusal-Vector Ablation to Llama 3.1 70B Agents

AI & ML interests

Team members 8

sureheremarv's activity