loubnabnl HF staff commited on
Commit
d9fdb7e
·
1 Parent(s): 5d8739b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +17 -4
README.md CHANGED
@@ -26,7 +26,11 @@ pinned: false
26
 
27
  <li>
28
  <p>
29
- <b>Spaces:</b> code generation with: <a ref="https://huggingface.co/codeparrot/codeparrot" class="underline">CodeParrot (1.5B)</a>, <a href="https://huggingface.co/facebook/incoder-6B" class="underline">InCoder (6B)</a> and <a href="https://github.com/salesforce/CodeGen" class="underline">CodeGen (6B)</a>
 
 
 
 
30
  </p>
31
  </li>
32
  <br>
@@ -34,6 +38,11 @@ pinned: false
34
  <li><b>Models:</b> CodeParrot (1.5B) and CodeParrot-small (110M), each repo has different ongoing experiments in the branches.</li>
35
  <br>
36
 
 
 
 
 
 
37
  <li><b>Datasets:</b><ul>
38
  <li>1- <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean" class="underline">codeparrot-clean</a>, dataset on which we trained and evaluated CodeParrot, the splits are available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-train" class="underline">codeparrot-clean-train</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-valid" class="underline">codeparrot-clean-valid</a>.</li>
39
 
@@ -42,9 +51,13 @@ pinned: false
42
  </li>
43
  <li>4- CodeParrot dataset after both near deduplication and the additional filtering , it's available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-v2-near-dedup" class="underline">codeparrot-train-v2-near-dedup</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-v2-near-dedup" class="underline">codeparrot-valid-v2-near-dedup</a>.</li>
44
  <li>5- <a href="https://huggingface.co/datasets/codeparrot/github-code" class="underline">GitHub-Code</a>, a 1TB dataset of 32 programming languages from GitHub files.</li>
45
- <li>6- <a href="https://huggingface.co/datasets/codeparrot/github-jupyter" class="underline">GitHub-Jupyter</a>, a 16.3GB dataset of Jupyter Notebooks from BigQuery GitHub.</li>
46
- <li>7- <a href="https://huggingface.co/datasets/codeparrot/apps" class="underline">APPS</a>, a benchmark for code generation with 10000 problems.</li>
47
- <li>8- <a href="https://huggingface.co/datasets/codeparrot/codecomplex" class="underline">CodeComplex</a>, an annotated dataset of 4,200 Java codes and their time complexity.</li>
 
 
 
 
48
  </ul>
49
  </li>
50
  </ul>
 
26
 
27
  <li>
28
  <p>
29
+ <b>Spaces:</b>
30
+
31
+ <li>Code generation with: <a ref="https://huggingface.co/codeparrot/codeparrot" class="underline">CodeParrot (1.5B)</a>, <a href="https://huggingface.co/facebook/incoder-6B" class="underline">InCoder (6B)</a> and <a href="https://github.com/salesforce/CodeGen" class="underline">CodeGen (6B)</a></li>
32
+ <li>Spaces for some code downstream tasks: algorthmic complexity prediction (BigO), code explanation and code generation from english text.</li>
33
+
34
  </p>
35
  </li>
36
  <br>
 
38
  <li><b>Models:</b> CodeParrot (1.5B) and CodeParrot-small (110M), each repo has different ongoing experiments in the branches.</li>
39
  <br>
40
 
41
+ <br>
42
+
43
+ <li><b>Metrics:</b> <a ref="https://huggingface.co/spaces/codeparrot/apps_metric" class="underline">APPS metric</a> for the evaluation of code models on APPS benchmark.</li>
44
+ <br>
45
+
46
  <li><b>Datasets:</b><ul>
47
  <li>1- <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean" class="underline">codeparrot-clean</a>, dataset on which we trained and evaluated CodeParrot, the splits are available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-train" class="underline">codeparrot-clean-train</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-clean-valid" class="underline">codeparrot-clean-valid</a>.</li>
48
 
 
51
  </li>
52
  <li>4- CodeParrot dataset after both near deduplication and the additional filtering , it's available under <a href="https://huggingface.co/datasets/codeparrot/codeparrot-train-v2-near-dedup" class="underline">codeparrot-train-v2-near-dedup</a> and <a href="https://huggingface.co/datasets/codeparrot/codeparrot-valid-v2-near-dedup" class="underline">codeparrot-valid-v2-near-dedup</a>.</li>
53
  <li>5- <a href="https://huggingface.co/datasets/codeparrot/github-code" class="underline">GitHub-Code</a>, a 1TB dataset of 32 programming languages from GitHub files.</li>
54
+ <li>6- <a href="https://huggingface.co/datasets/codeparrot/github-code-clean" class="underline">GitHub-Code-Clean</a>, a cleaner version of GitHub-Code dataset.</li>
55
+ <li>7- <a href="https://huggingface.co/datasets/codeparrot/github-jupyter" class="underline">GitHub-Jupyter</a>, a 16.3GB dataset of Jupyter Notebooks from BigQuery GitHub.</li>
56
+ <li>8- <a href="https://huggingface.co/datasets/codeparrot/apps" class="underline">APPS</a>, a benchmark for code generation with 10000 problems.</li>
57
+ <li>9- <a href="https://huggingface.co/datasets/codeparrot/codecomplex" class="underline">CodeComplex</a>, an annotated dataset of 4,200 Java codes and their time complexity.</li>
58
+ <li>10- <a href="https://huggingface.co/datasets/codeparrot/xlcost-text-to-code" class="underline">XLCOST-text-to-code</a>, a subset of XLCoST benchmark, for text-to-code generation at snippet level and program level for 7 programming languages: Python, C, C#, C++, Java, Javascript and PHP.</li>
59
+ <li>10- <a href="https://huggingface.co/datasets/codeparrot/github-jupyter-text-code-pairs" class="underline">github-jupyter-text-code-pairs</a>, a dataset of text and code pairs extracted from Jupyter notebooks.</li>
60
+
61
  </ul>
62
  </li>
63
  </ul>