dumball's picture

dumball

archit11

·

https://archit-spec.github.io

AI & ML interests

small language models, looking for work please reachout [email protected]

Recent Activity

liked a dataset 1 day ago

open-thoughts/OpenThoughts-114k

liked a dataset 6 days ago

simplescaling/s1K

upvoted an article 7 days ago

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

View all activity

Organizations

archit11's activity

upvoted an article 7 days ago

Article

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

By

•

Aug 4, 2024

• 29

upvoted 3 collections 8 days ago

Scotch & SOTA 🥃 Pt. 7: Human Feedback Datasets 🫣

The elusive “human” feedback • 1 item • Updated Sep 13, 2023 • 1

Scotch & SOTA 🥃 Pt. 6: Dialogue Tuning Datasets 💬

Conversations, turn-based dialog, and things that can be turned into that. • 4 items • Updated Sep 13, 2023 • 1

Scotch & SOTA 🥃 Pt. 5: Instruction Tuning Datasets 👩‍🏫

Question & answer, task completion, general SFT and otherwise finetuney data. • 7 items • Updated Sep 13, 2023 • 1

upvoted an article 12 days ago

Article

How to deploy and fine-tune DeepSeek models on AWS

13 days ago

• 40

upvoted an article 14 days ago

Article

Can we create pedagogically valuable multi-turn synthetic datasets from Cosmopedia?

By

•

May 7, 2024

• 8

upvoted a collection 15 days ago

Deepseek Papers

Deepseek papers collection • 15 items • Updated 8 days ago • 59

upvoted an article 26 days ago

Article

Train 400x faster Static Embedding Models with Sentence Transformers

28 days ago

• 142

upvoted 4 collections 2 months ago

Reasoning

151 items • Updated Apr 6, 2024 • 29

🤖 Agents

21 items • Updated Dec 31, 2024 • 120

Tulu 3 Datasets

All datasets released with Tulu 3 -- state of the art open post-training recipes. • 33 items • Updated 1 day ago • 68

PixMo

A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 9 items • Updated 1 day ago • 59

upvoted a paper 3 months ago

Gemma 2: Improving Open Language Models at a Practical Size

Paper • 2408.00118 • Published Jul 31, 2024 • 76

upvoted a collection 3 months ago

VLM Datasets

29 items • Updated about 1 month ago • 1

upvoted 3 articles 3 months ago

Article

Low Code Large Language Model Alignment

By

•

Nov 19, 2024

• 13

Article

The Beginners Guide to Cleaning a Dataset

By

•

Nov 18, 2024

• 24

Article

PyTorchModelHubMixin: Bridging the Gap for Custom AI Models on Hugging Face

By

and 1 other •

Nov 11, 2024

• 16

upvoted a collection 3 months ago

Qwen2.5-Coder

Code-specific model series based on Qwen2.5 • 40 items • Updated Nov 28, 2024 • 279

upvoted an article 3 months ago

Article

Recipe: Preparing Multilingual Speech Datasets for TTS Training

By

and 1 other •

Nov 4, 2024

• 18

upvoted a collection 3 months ago

MobileLLM

Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 9 items • Updated Nov 27, 2024 • 103