Jason Corkill

jasoncorkill

AI & ML interests

Human data annotation

Recent Activity

View all activity

Organizations

Blog-explorers's profile picture Hugging Face Discord Community's profile picture mlo-data-collab's profile picture Rapidata's profile picture

jasoncorkill's activity

reacted to their post with 👀 1 day ago
view post
Post
1705
Benchmarking Google's Veo2: How Does It Compare?

The results did not meet expectations. Veo2 struggled with style consistency and temporal coherence, falling behind competitors like Runway, Pika, Tencent, and even Alibaba. While the model shows promise, its alignment and quality are not yet there.

Google recently launched Veo2, its latest text-to-video model, through select partners like fal.ai. As part of our ongoing evaluation of state-of-the-art generative video models, we rigorously benchmarked Veo2 against industry leaders.

We generated a large set of Veo2 videos spending hundreds of dollars in the process and systematically evaluated them using our Python-based API for human and automated labeling.

Check out the ranking here: https://www.rapidata.ai/leaderboard/video-models

Rapidata/text-2-video-human-preferences-veo2
posted an update 1 day ago
view post
Post
1705
Benchmarking Google's Veo2: How Does It Compare?

The results did not meet expectations. Veo2 struggled with style consistency and temporal coherence, falling behind competitors like Runway, Pika, Tencent, and even Alibaba. While the model shows promise, its alignment and quality are not yet there.

Google recently launched Veo2, its latest text-to-video model, through select partners like fal.ai. As part of our ongoing evaluation of state-of-the-art generative video models, we rigorously benchmarked Veo2 against industry leaders.

We generated a large set of Veo2 videos spending hundreds of dollars in the process and systematically evaluated them using our Python-based API for human and automated labeling.

Check out the ranking here: https://www.rapidata.ai/leaderboard/video-models

Rapidata/text-2-video-human-preferences-veo2
posted an update 14 days ago
view post
Post
3824
Has OpenGVLab Lumina Outperformed OpenAI’s Model?

We’ve just released the results from a large-scale human evaluation (400k annotations) of OpenGVLab’s newest text-to-image model, Lumina. Surprisingly, Lumina outperforms OpenAI’s DALL-E 3 in terms of alignment, although it ranks #6 in our overall human preference benchmark.

To support further development in text-to-image models, we’re making our entire human-annotated dataset publicly available. If you’re working on model improvements and need high-quality data, feel free to explore.

We welcome your feedback and look forward to any insights you might share!

Rapidata/OpenGVLab_Lumina_t2i_human_preference
reacted to their post with 🔥 17 days ago
view post
Post
2466
The Sora Video Generation Aligned Words dataset contains a collection of word segments for text-to-video or other multimodal research. It is intended to help researchers and engineers explore fine-grained prompts, including those where certain words are not aligned with the video.

We hope this dataset will support your work in prompt understanding and advance progress in multimodal projects.

If you have specific questions, feel free to reach out.
Rapidata/sora-video-generation-aligned-words
posted an update 17 days ago
view post
Post
2466
The Sora Video Generation Aligned Words dataset contains a collection of word segments for text-to-video or other multimodal research. It is intended to help researchers and engineers explore fine-grained prompts, including those where certain words are not aligned with the video.

We hope this dataset will support your work in prompt understanding and advance progress in multimodal projects.

If you have specific questions, feel free to reach out.
Rapidata/sora-video-generation-aligned-words
reacted to their post with 👀 22 days ago
view post
Post
2846
Integrating human feedback is vital for evolving AI models. Boost quality, scalability, and cost-effectiveness with our crowdsourcing tool!

..Or run A/B tests and gather thousands of responses in minutes. Upload two images, ask a question, and watch the insights roll in!

Check it out here and let us know your feedback: https://app.rapidata.ai/compare
posted an update 22 days ago
view post
Post
2846
Integrating human feedback is vital for evolving AI models. Boost quality, scalability, and cost-effectiveness with our crowdsourcing tool!

..Or run A/B tests and gather thousands of responses in minutes. Upload two images, ask a question, and watch the insights roll in!

Check it out here and let us know your feedback: https://app.rapidata.ai/compare
reacted to their post with 👀 24 days ago
view post
Post
2506
This dataset was collected in roughly 4 hours using the Rapidata Python API, showcasing how quickly large-scale annotations can be performed with the right tooling!

All that at less than the cost of a single hour of a typical ML engineer in Zurich!

The new dataset of ~22,000 human annotations evaluating AI-generated videos based on different dimensions, such as Prompt-Video Alignment, Word for Word Prompt Alignment, Style, Speed of Time flow and Quality of Physics.

Rapidata/text-2-video-Rich-Human-Feedback
posted an update 24 days ago
view post
Post
2506
This dataset was collected in roughly 4 hours using the Rapidata Python API, showcasing how quickly large-scale annotations can be performed with the right tooling!

All that at less than the cost of a single hour of a typical ML engineer in Zurich!

The new dataset of ~22,000 human annotations evaluating AI-generated videos based on different dimensions, such as Prompt-Video Alignment, Word for Word Prompt Alignment, Style, Speed of Time flow and Quality of Physics.

Rapidata/text-2-video-Rich-Human-Feedback
posted an update 30 days ago
view post
Post
4544
Runway Gen-3 Alpha: The Style and Coherence Champion

Runway's latest video generation model, Gen-3 Alpha, is something special. It ranks #3 overall on our text-to-video human preference benchmark, but in terms of style and coherence, it outperforms even OpenAI Sora.

However, it struggles with alignment, making it less predictable for controlled outputs.

We've released a new dataset with human evaluations of Runway Gen-3 Alpha: Rapidata's text-2-video human preferences dataset. If you're working on video generation and want to see how your model compares to the biggest players, we can benchmark it for you.

🚀 DM us if you’re interested!

Dataset: Rapidata/text-2-video-human-preferences-runway-alpha
  • 1 reply
·
posted an update about 1 month ago
view post
Post
2683
We benchmarked @xai-org 's Aurora model, as far as we know the first public evaluation of the model at scale.

We collected 401k human annotations in over the past ~2 days for this, we have uploaded all of the annotation data here on huggingface with a fully permissive license
Rapidata/xAI_Aurora_t2i_human_preferences
reacted to ariG23498's post with 🚀 about 2 months ago
posted an update 2 months ago
view post
Post
1145
We uploaded huge human annotated preference dataset for image generation. Instead of just having people choose which model they preferer, we annotated an alignment score on a word by word basis for the prompt. rate the images on coherence, overall alignment and style preference. Those images that score badly were also given to annotators to highlight problem areas. Check it out! Rapidata/text-2-image-Rich-Human-Feedback

We also wrote a blog post for those who want a bit more detail:
https://huggingface.co/blog/RapidataAI/beyond-image-preferences
posted an update 2 months ago