Stoney Kang's picture
7 8

Stoney Kang

sikang99
Β·

AI & ML interests

Remote Control based on Vision

Recent Activity

liked a model about 1 month ago
microsoft/phi-4
reacted to merve's post with πŸ‘ 3 months ago
Last week we were blessed with open-source models! A recap πŸ’ https://huggingface.co/collections/merve/nov-29-releases-674ccc255a57baf97b1e2d31 πŸ–ΌοΈ Multimodal > At Hugging Face we released SmolVLM, a performant and efficient smol vision language model πŸ’— > Show Lab released ShowUI-2B: new vision-language-action model to build GUI/web automation agents πŸ€– > Rhymes AI has released the base model of Aria: Aria-Base-64K and Aria-Base-8K with their respective context length > ViDoRe team released ColSmolVLM: A new ColPali-like retrieval model based on SmolVLM > Dataset: Llava-CoT-o1-Instruct: new dataset labelled using Llava-CoT multimodal reasoning modelπŸ“– > Dataset: LLaVA-CoT-100k dataset used to train Llava-CoT released by creators of Llava-CoT πŸ“• πŸ’¬ LLMs > Qwen team released QwQ-32B-Preview, state-of-the-art open-source reasoning model, broke the internet πŸ”₯ > AliBaba has released Marco-o1, a new open-source reasoning model πŸ’₯ > NVIDIA released Hymba 1.5B Base and Instruct, the new state-of-the-art SLMs with hybrid architecture (Mamba + transformer) ⏯️ Image/Video Generation > Qwen2VL-Flux: new image generation model based on Qwen2VL image encoder, T5 and Flux for generation > Lightricks released LTX-Video, a new DiT-based video generation model that can generate 24 FPS videos at 768x512 res ⏯️ > Dataset: Image Preferences is a new image generation preference dataset made with DIBT community effort of Argilla 🏷️ Audio > OuteAI released OuteTTS-0.2-500M new multilingual text-to-speech model based on Qwen-2.5-0.5B trained on 5B audio prompt tokens
View all activity

Organizations

TeamGRIT, Co. Ltd.'s profile picture

sikang99's activity

upvoted an article 17 days ago
view article
Article

We now support VLMs in smolagents!

β€’ 81
reacted to merve's post with πŸ‘ 3 months ago
view post
Post
2927
Last week we were blessed with open-source models! A recap πŸ’
merve/nov-29-releases-674ccc255a57baf97b1e2d31

πŸ–ΌοΈ Multimodal
> At Hugging Face we released SmolVLM, a performant and efficient smol vision language model πŸ’—
> Show Lab released ShowUI-2B: new vision-language-action model to build GUI/web automation agents πŸ€–
> Rhymes AI has released the base model of Aria: Aria-Base-64K and Aria-Base-8K with their respective context length
> ViDoRe team released ColSmolVLM: A new ColPali-like retrieval model based on SmolVLM
> Dataset: Llava-CoT-o1-Instruct: new dataset labelled using Llava-CoT multimodal reasoning modelπŸ“–
> Dataset: LLaVA-CoT-100k dataset used to train Llava-CoT released by creators of Llava-CoT πŸ“•

πŸ’¬ LLMs
> Qwen team released QwQ-32B-Preview, state-of-the-art open-source reasoning model, broke the internet πŸ”₯
> AliBaba has released Marco-o1, a new open-source reasoning model πŸ’₯
> NVIDIA released Hymba 1.5B Base and Instruct, the new state-of-the-art SLMs with hybrid architecture (Mamba + transformer)

⏯️ Image/Video Generation
> Qwen2VL-Flux: new image generation model based on Qwen2VL image encoder, T5 and Flux for generation
> Lightricks released LTX-Video, a new DiT-based video generation model that can generate 24 FPS videos at 768x512 res ⏯️
> Dataset: Image Preferences is a new image generation preference dataset made with DIBT community effort of Argilla 🏷️

Audio
> OuteAI released OuteTTS-0.2-500M new multilingual text-to-speech model based on Qwen-2.5-0.5B trained on 5B audio prompt tokens
reacted to merve's post with ❀️ 4 months ago
view post
Post
2850
This is not a drill πŸ’₯
HuggingChat is now multimodal with meta-llama/Llama-3.2-11B-Vision-Instruct! πŸ€—
This also comes with multimodal assistants, I have migrated my Marcus Aurelius advice assistant to Llama-Vision and Marcus can see now! πŸ˜„

Chat with Marcus: https://hf.co/chat/assistant/65bfed22022ba290531112f8
Start chatting with Llama-Vision 3.2 11B Instruct https://huggingface.co/chat/models/meta-llama/Llama-3.2-11B-Vision-Instruct
  • 1 reply
Β·
upvoted an article 6 months ago
view article
Article

Train Custom Models on Hugging Face Spaces with AutoTrain SpaceRunner

By abhishek β€’
β€’ 15
reacted to merve's post with πŸ‘ 6 months ago
view post
Post
2559
πŸ₯Ή @lbourdois has made an app to browse all of my vision paper summaries for everyone's convenience merve/vision_papers
reacted to Avelina's post with πŸ˜” 9 months ago
view post
Post
1173
Found out my ECCV paper is getting rejected because of a LaTeX compile error :(