nomadicsynth commited on
Commit
ac1681f
Β·
1 Parent(s): f272480

Add detailed background and motivation sections to README

Browse files
Files changed (1) hide show
  1. README.md +19 -1
README.md CHANGED
@@ -38,6 +38,18 @@ This demo uses the first prototype of the model, trained on a dataset of **10,00
38
 
39
  ---
40
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  ## Why Inkling?
42
 
43
  > Because the right connection is often obvious β€” once someone points it out.
@@ -46,6 +58,12 @@ Researchers today are overwhelmed by volume. Inkling helps restore those missed-
46
 
47
  ---
48
 
 
 
 
 
 
 
49
  ## Status
50
 
51
- Inkling is in **alpha** and under active development. The current model is hosted via Gradio, with a Hugging Face Space available for live interaction and feedback. Contributions, feedback, and collaboration are welcome.
 
38
 
39
  ---
40
 
41
+ ## Background and Motivation
42
+
43
+ Scientific progress often depends on connecting ideas across papers, fields, and years of literature. But with the volume of research growing exponentially, it's increasingly difficult for any one person β€” or even a team β€” to stay on top of it all. As a result, valuable connections between papers often go unnoticed simply because the right expert never read both.
44
+
45
+ In 2024, Luo et al. published a landmark study in *Nature Human Behaviour* showing that **large language models (LLMs) can outperform human experts** in predicting the results of neuroscience experiments by integrating knowledge across the scientific literature. Their model, **BrainGPT**, demonstrated how tuning a general-purpose LLM (like Mistral-7B) on domain-specific data could synthesize insights that surpass human forecasting ability. Notably, the authors found that models as small as 7B parameters performed well β€” an insight that influenced the foundation for this project.
46
+
47
+ Inspired by this work β€” and a YouTube breakdown by physicist and science communicator Sabine Hossenfelder β€” this project began as an attempt to explore similar methods of knowledge integration at the level of paper-pair relationships. The goal: to train a model that could recognize and reason about **conceptual, methodological, or application-level connections** between research papers, even when those links might be overlooked due to fragmentation in the literature.
48
+
49
+ Originally conceived as a perplexity-ranking experiment using LLMs directly (mirroring Luo et al.'s evaluation method), the project gradually evolved into what it is now β€” **Inkling**, a reasoning-aware embedding model fine-tuned on LLM-rated abstract pairings, and built to help researchers uncover links that would be obvious β€” *if only someone had the time to read everything*.
50
+
51
+ ---
52
+
53
  ## Why Inkling?
54
 
55
  > Because the right connection is often obvious β€” once someone points it out.
 
58
 
59
  ---
60
 
61
+ ## Citation
62
+
63
+ > Luo, X., Rechardt, A., Sun, G. et al. Large language models surpass human experts in predicting neuroscience results. *Nat Hum Behav* **9**, 305–315 (2025). [https://doi.org/10.1038/s41562-024-02046-9](https://doi.org/10.1038/s41562-024-02046-9)
64
+
65
+ ---
66
+
67
  ## Status
68
 
69
+ Inkling is in **alpha** and under active development. The current model is hosted via Gradio, with a Hugging Face Space available for live interaction and feedback. Contributions, feedback, and collaboration are welcome.