Spaces:

orionweller
/

human-mlm-clm-predictor

Running

orionweller commited on 26 days ago

Commit

d1414a2

verified ·

1 Parent(s): ee1d16c

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -11,4 +11,44 @@ license: mit
 short_description: See if you can predict the masked tokens / next token!
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 short_description: See if you can predict the masked tokens / next token!
 ---
+## MLM and NTP Testing App
+This Hugging Face Gradio space tests users on two fundamental NLP tasks:
+Masked Language Modeling (MLM) - Guess the masked words in a text
+Next Token Prediction (NTP) - Predict how a text continues
+#### Features
+Switch between MLM and NTP tasks with a simple radio button
+Adjust masking/cutting ratio to control difficulty
+Sample texts from the cc_news dataset (100 samples)
+Track and display user accuracy for both tasks
+Detailed feedback on answers
+#### How to Use
+##### For MLM Task
+Select "mlm" in the Task Type radio button
+Adjust mask ratio as desired (higher = more difficult)
+Click "New Sample" to get a text with [MASK] tokens
+Enter your guesses for the masked words, separated by spaces or commas
+Click "Check Answer" to see your accuracy
+##### For NTP Task
+Select "ntp" in the Task Type radio button
+Adjust cut ratio as desired (higher = more text is hidden)
+Click "New Sample" to get a partial text
+Type your prediction of how the text continues
+Click "Check Answer" to see your accuracy and the actual continuation
+#### Statistics
+The app keeps track of your accuracy for both tasks
+Click "Reset Stats" to start fresh
+#### Technical Details
+Uses HuggingFace's cc_news dataset (vblagoje/cc_news)
+Employs streaming to efficiently sample 100 documents
+Uses BERT tokenizer for consistent tokenization