Partha117 commited on
Commit
1517f5d
Β·
verified Β·
1 Parent(s): 16a8f65

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +71 -8
README.md CHANGED
@@ -1,14 +1,77 @@
 
 
1
  ---
 
 
2
  license: mit
3
  datasets:
4
- - bug-localization/BeetleBox
5
  language:
6
- - en
7
  base_model:
8
- - codesage/codesage-base
9
  tags:
10
- - bug
11
- - localization
12
- - embedding
13
- - crosslanguage
14
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Here's the finalized `README.md` in Hugging Face-compatible format, including the metadata block and your full specifications:
2
+
3
  ---
4
+
5
+ ```yaml
6
  license: mit
7
  datasets:
8
+ - bug-localization/BeetleBox
9
  language:
10
+ - en
11
  base_model:
12
+ - codesage/codesage-base
13
  tags:
14
+ - bug
15
+ - localization
16
+ - embedding
17
+ - crosslanguage
18
+ ```
19
+
20
+ # πŸ”₯ BLAZE: Cross-Language and Cross-Project Bug Localization
21
+
22
+ **BLAZE** is a transformer-based bug localization model that works across languages and software projects. It enhances source-bug alignment using **dynamic chunking** and **hard example learning**, enabling precise bug localization in unseen codebases and programming languages.
23
+
24
+ [![Paper](https://img.shields.io/badge/Paper-TSE%202025-blue)](https://doi.org/10.1109/TSE.2025.3579574)
25
+ [![Dataset](https://img.shields.io/badge/Dataset-Zenodo-9cf)](https://zenodo.org/records/15122980)
26
+
27
+ ---
28
+
29
+ ## ✨ Highlights
30
+
31
+ * πŸ“Œ **Cross-project & cross-language** bug localization with no re-training
32
+ * πŸ“ **Dynamic Chunking** handles long files within LLM context windows
33
+ * 🧠 **Hard Example Learning** improves generalization and ranking accuracy
34
+ * 🌍 Supports Java, Python, C++, JavaScript, and Go
35
+ * πŸ“Š Outperforms both cross-project and embedding-based baselines
36
+
37
+ ---
38
+
39
+ ## πŸ“‚ Dataset: BeetleBox
40
+
41
+ **BeetleBox** is the largest curated dataset for bug localization:
42
+
43
+ * 23,782 real-world bugs
44
+ * 29 repositories
45
+ * 5 programming languages
46
+ * Cleaned and de-duplicated to remove overlaps with training data
47
+
48
+ πŸ“₯ [Available on Zenodo](https://zenodo.org/records/15122980)
49
+ πŸ“š Also listed on Hugging Face Datasets: `bug-localization/BeetleBox`
50
+
51
+ ---
52
+
53
+ ## πŸš€ Demo & Usage
54
+
55
+ All code, usage instructions, model files, and scripts are available via:
56
+
57
+ πŸ‘‰ **[BLAZE Repository & Demo (Zenodo)](https://zenodo.org/records/15122980)**
58
+
59
+ ---
60
+
61
+ ## πŸ“ Citation
62
+
63
+ Please cite the following paper if you use BLAZE or BeetleBox in your work:
64
+
65
+ ```bibtex
66
+ @article{Chakraborty2025,
67
+ title = {BLAZE: Cross-Language and Cross-Project Bug Localization via Dynamic Chunking and Hard Example Learning},
68
+ ISSN = {2326-3881},
69
+ url = {http://dx.doi.org/10.1109/TSE.2025.3579574},
70
+ DOI = {10.1109/TSE.2025.3579574},
71
+ journal = {IEEE Transactions on Software Engineering},
72
+ publisher = {Institute of Electrical and Electronics Engineers (IEEE)},
73
+ author = {Chakraborty, Partha and Alfadel, Mahmoud and Nagappan, Meiyappan},
74
+ year = {2025},
75
+ pages = {1--14}
76
+ }
77
+ ```