usvsnsp
/

code-vs-nl

Text Classification

Generated from Trainer

Model card Files Files and versions

Metrics Training metrics Community

usvsnsp commited on Jan 27, 2023

Commit

5504192

·

1 Parent(s): a60b413

Update README.md

Files changed (1) hide show

README.md +12 -5

README.md CHANGED Viewed

@@ -4,9 +4,15 @@ tags:
 - generated_from_trainer
 metrics:
 - accuracy
 model-index:
 - name: code-vs-nl
   results: []
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
@@ -14,7 +20,8 @@ should probably proofread and complete it, then remove this comment. -->
 # code-vs-nl
-This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased) on an unknown dataset.
 It achieves the following results on the evaluation set:
 - Loss: 0.5180
 - Accuracy: 0.9951
@@ -22,15 +29,15 @@ It achieves the following results on the evaluation set:
 ## Model description
-More information needed
 ## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure
@@ -58,4 +65,4 @@ The following hyperparameters were used during training:
 - Transformers 4.25.1
 - Pytorch 1.13.1+cu116
 - Datasets 2.8.0
-- Tokenizers 0.13.2

 - generated_from_trainer
 metrics:
 - accuracy
+- f1
 model-index:
 - name: code-vs-nl
   results: []
+datasets:
+- bookcorpus
+- codeparrot/github-code
+language:
+- en
 ---
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 # code-vs-nl
+This model is a fine-tuned version of [distilbert-base-uncased](https://huggingface.co/distilbert-base-uncased)
+on [bookcorpus](https://huggingface.co/datasets/bookcorpus) for text and [codeparrot/github-code](https://huggingface.co/datasets/codeparrot/github-code) for code datasets.
 It achieves the following results on the evaluation set:
 - Loss: 0.5180
 - Accuracy: 0.9951
 ## Model description
+As it's a finetuned model, it's architecture is same as distilbert-base-uncased for Sequence Classification
 ## Intended uses & limitations
+Can be used to classify documents into text and code
 ## Training and evaluation data
+It is a mix of above two datasets, equally random sampled
 ## Training procedure
 - Transformers 4.25.1
 - Pytorch 1.13.1+cu116
 - Datasets 2.8.0
+- Tokenizers 0.13.2