jorgemarcc commited on
Commit
666556c
·
verified ·
1 Parent(s): 3dfa2e3

Delete README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -56
README.md DELETED
@@ -1,56 +0,0 @@
1
- # Code Similarity Visualization with GraphCodeBERT
2
-
3
- This interactive application visualizes token-level embeddings generated by [GraphCodeBERT](https://huggingface.co/microsoft/graphcodebert-base) for classical sorting algorithms. It supports pairwise comparison of algorithms based on their representation in the model’s embedding space, using PCA for dimensionality reduction.
4
-
5
- ## ✒️ Reference
6
-
7
- Martinez-Gil, J. (2025).
8
- **Augmenting the Interpretability of GraphCodeBERT for Code Similarity Tasks**.
9
- *International Journal of Software Engineering and Knowledge Engineering*, 35(05), 657–678.
10
-
11
- ## 🚀 Features
12
-
13
- - Selection of two classical sorting algorithms.
14
- - Automatic tokenization and embedding via GraphCodeBERT.
15
- - PCA-based projection into 2D space for visualization.
16
- - Clean, static matplotlib plots showing token overlap and divergence.
17
-
18
- ## 🧠 Technical Overview
19
-
20
- - **Model**: [`microsoft/graphcodebert-base`](https://huggingface.co/microsoft/graphcodebert-base)
21
- - **Tokenizer**: RobertaTokenizer
22
- - **Embeddings**: Last hidden layer of GraphCodeBERT
23
- - **Reduction Technique**: Principal Component Analysis (PCA)
24
- - **Interface**: Gradio
25
- - **Languages**: Python 3.10+
26
-
27
- ## 🔬 Research Context
28
-
29
- This tool supports research on code similarity, clone detection, and representation learning for source code. It offers insight into how GraphCodeBERT encodes common algorithmic patterns, providing a visual companion to embedding-based analysis.
30
-
31
- ## 🛠 Dependencies
32
-
33
- All required libraries are listed in `requirements.txt`:
34
-
35
- ```
36
-
37
- transformers
38
- torch
39
- scikit-learn
40
- numpy
41
- matplotlib
42
- gradio
43
- Pillow
44
-
45
- ```
46
-
47
- ## 🖥️ Intended Use
48
-
49
- - Academic teaching and demonstration of code embeddings
50
- - Qualitative evaluation of pretrained models for source code
51
- - Supplementary visualization for software engineering publications
52
-
53
- ## 📬 Contact
54
-
55
- **Jorge Martinez-Gil**
56
- Senior Research Scientist in Computer Science