Amaira Catherine's picture

Amaira Catherine

Amaira223
ยท

AI & ML interests

None yet

Recent Activity

reacted to mmhamdy's post with ๐Ÿ‘ about 11 hours ago
โ›“ Evaluating Long Context #2: SCROLLS and ZeroSCROLLS In this series of posts about tracing the history of long context evaluation, we started with Long Range Arena (LRA). Introduced in 2020, Long Range Arens (LRA) is one of the earliest benchmarks designed to tackle the challenge of long context evaluation. But it wasn't introduced to evaluate LLMs, but rather the transformer architecture in general. ๐Ÿ“œ The SCROLLS benchmark, introduced in 2022, addresses this gap in NLP/LLM research. SCROLLS challenges models with tasks that require reasoning over extended sequences (according to 2022 standards). So, what does it offer? 1๏ธโƒฃ Long Text Focus: SCROLLS (unlike LRA) focus mainly on text and contain inputs with thousands of words, testing models' ability to synthesize information across lengthy documents. 2๏ธโƒฃ Diverse Tasks: Includes summarization, question answering, and natural language inference across domains like literature, science, and business. 3๏ธโƒฃ Unified Format: All datasets are available in a text-to-text format, facilitating easy evaluation and comparison of models. Building on SCROLLS, ZeroSCROLLS takes long text evaluation to the next level by focusing on zero-shot learning. Other features include: 1๏ธโƒฃ New Tasks: Introduces tasks like sentiment aggregation and sorting book chapter summaries. 2๏ธโƒฃ Leaderboard: A live leaderboard encourages continuous improvement and competition among researchers. ๐Ÿ’ก What are some other landmark benchmarks in the history of long context evaluation? Feel free to share your thoughts and suggestions in the comments. - SCROLLS Paper: https://huggingface.co/papers/2201.03533 - ZeroSCROLLS Paper: https://huggingface.co/papers/2305.14196
View all activity

Organizations

None yet

Amaira223's activity

reacted to burtenshaw's post with ๐Ÿ”ฅ about 11 hours ago
view post
Post
2390
The Hugging Face agents course is finally out!

๐Ÿ‘‰ https://huggingface.co/agents-course

This first unit of the course sets you up with all the fundamentals to become a pro in agents.

- What's an AI Agent?
- What are LLMs?
- Messages and Special Tokens
- Understanding AI Agents through the Thought-Action-Observation Cycle
- Thought, Internal Reasoning and the Re-Act Approach
- Actions, Enabling the Agent to Engage with Its Environment
- Observe, Integrating Feedback to Reflect and Adapt
reacted to mmhamdy's post with ๐Ÿ‘ about 11 hours ago
view post
Post
1694
โ›“ Evaluating Long Context #2: SCROLLS and ZeroSCROLLS

In this series of posts about tracing the history of long context evaluation, we started with Long Range Arena (LRA). Introduced in 2020, Long Range Arens (LRA) is one of the earliest benchmarks designed to tackle the challenge of long context evaluation. But it wasn't introduced to evaluate LLMs, but rather the transformer architecture in general.

๐Ÿ“œ The SCROLLS benchmark, introduced in 2022, addresses this gap in NLP/LLM research. SCROLLS challenges models with tasks that require reasoning over extended sequences (according to 2022 standards). So, what does it offer?

1๏ธโƒฃ Long Text Focus: SCROLLS (unlike LRA) focus mainly on text and contain inputs with thousands of words, testing models' ability to synthesize information across lengthy documents.
2๏ธโƒฃ Diverse Tasks: Includes summarization, question answering, and natural language inference across domains like literature, science, and business.
3๏ธโƒฃ Unified Format: All datasets are available in a text-to-text format, facilitating easy evaluation and comparison of models.

Building on SCROLLS, ZeroSCROLLS takes long text evaluation to the next level by focusing on zero-shot learning. Other features include:

1๏ธโƒฃ New Tasks: Introduces tasks like sentiment aggregation and sorting book chapter summaries.
2๏ธโƒฃ Leaderboard: A live leaderboard encourages continuous improvement and competition among researchers.

๐Ÿ’ก What are some other landmark benchmarks in the history of long context evaluation? Feel free to share your thoughts and suggestions in the comments.

- SCROLLS Paper: SCROLLS: Standardized CompaRison Over Long Language Sequences (2201.03533)
- ZeroSCROLLS Paper: ZeroSCROLLS: A Zero-Shot Benchmark for Long Text Understanding (2305.14196)