Edit Models filters

Multimodal

Image-Text-to-Text

Visual Question Answering

Document Question Answering

Video-Text-to-Text

Audio-Text-to-Text

Visual Document Retrieval

Computer Vision

Image Classification

Object Detection

Video Classification

Image Segmentation

Image Feature Extraction

Zero-Shot Image Classification

Depth Estimation

Zero-Shot Object Detection

Unconditional Image Generation

Keypoint Detection

Mask Generation

Natural Language Processing

Text Generation

Text Classification

Text2Text Generation

Token Classification

Feature Extraction

Question Answering

Sentence Similarity

Zero-Shot Classification

Table Question Answering

Audio

Automatic Speech Recognition

Audio Classification

Voice Activity Detection

Tabular

Tabular Classification

Time Series Forecasting

Tabular Regression

Reinforcement Learning

Reinforcement Learning

Other

Graph Machine Learning

Models

956

Full-text search

Active filters: reinforcement-learning, transformers

baek26/wiki_asp-written_work_4057_bart-base

Reinforcement Learning • Updated Apr 3, 2024

baek26/wiki_asp-software_7902_bart-base

Reinforcement Learning • Updated Apr 4, 2024

baek26/wiki_asp-written_work_667_bart-base

Reinforcement Learning • Updated Apr 4, 2024

baek26/wiki_asp-animal_3469_bart-base

Reinforcement Learning • Updated Apr 4, 2024

baek26/wiki_asp-soccer_player_9782_bart-base

Reinforcement Learning • Updated Apr 4, 2024

PranavBP525/phi-2-storygen-v1

Reinforcement Learning • Updated Apr 13, 2024

PranavBP525/phi-2-storygen-v2

Reinforcement Learning • Updated Apr 19, 2024

baek26/dialogsum_4088_bart-dialogsum

Reinforcement Learning • Updated Apr 17, 2024 • 2

baek26/billsum_4768_bart-dialogsum

Reinforcement Learning • Updated Apr 17, 2024 • 2

baek26/dialogsum_9789_bart-dialogsum

Reinforcement Learning • Updated Apr 17, 2024 • 2

baek26/billsum_6121_bart-billsum

Reinforcement Learning • Updated Apr 17, 2024 • 2

baek26/bart-dialogsum-oracle

Reinforcement Learning • Updated Apr 17, 2024 • 2

baek26/billsum_1703_bart-billsum

Reinforcement Learning • Updated Apr 17, 2024 • 2

baek26/bart-billsum-oracle

Reinforcement Learning • Updated Apr 17, 2024 • 2

baek26/cnn_dailymail_6849_bart-dialogsum

Reinforcement Learning • Updated Apr 18, 2024 • 2

baek26/cnn_dailymail_886_bart-dialogsum

Reinforcement Learning • Updated Apr 18, 2024 • 2

baek26/cnn_dailymail_7952_bart-dialogsum

Reinforcement Learning • Updated Apr 18, 2024 • 2

baek26/cnn_dailymail_4520_bart-cnndm

Reinforcement Learning • Updated Apr 19, 2024 • 2

baek26/cnn_dailymail_3418_bart-cnndm

Reinforcement Learning • Updated Apr 19, 2024 • 1

damienbenveniste/mistral-ppo

Reinforcement Learning • Updated Aug 23, 2024 • 261

pkbiswas/Phi-1_5-Detoxified-PPO-LoRa

Reinforcement Learning • Updated Apr 20, 2024

ruffy369/iris-breakout

Reinforcement Learning • Updated Aug 3, 2024 • 4

PranavBP525/phi-2-storygen-rlGPTf

Reinforcement Learning • Updated Apr 21, 2024

baek26/all_5483_all_8657_bart-base_rl

Reinforcement Learning • Updated Apr 21, 2024 • 2

baek26/all_9991_all_8657_bart-base_rl

Reinforcement Learning • Updated Apr 21, 2024 • 2

baek26/all_9006_all_8657_bart-base_rl

Reinforcement Learning • Updated Apr 21, 2024 • 2

baek26/all_6417_bart-base_rl

Reinforcement Learning • Updated Apr 22, 2024 • 2

lzacchini/ppo-LunarLander-v2

Reinforcement Learning • Updated May 10, 2024

PranavBP525/phi-2-storygen-rlhf

Reinforcement Learning • Updated Apr 24, 2024

baek26/all_5286_all_6417_bart-base_rl

Reinforcement Learning • Updated Apr 29, 2024 • 2