File size: 4,463 Bytes
2a20f7f
323b1b7
0d63382
537c9da
35ac796
bcd4a65
2a20f7f
f272480
2a20f7f
a2dd896
2a20f7f
cd676ee
e6a1391
bcd4a65
e6a1391
bcd4a65
2a20f7f
 
323b1b7
a2dd896
781f01b
 
323b1b7
a2dd896
323b1b7
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ac1681f
 
 
 
 
 
d89b95a
ac1681f
 
 
 
 
323b1b7
 
 
 
 
 
 
 
ac1681f
 
198284d
ac1681f
 
 
323b1b7
 
ac1681f
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
title: Inkling
emoji: 🌐
colorFrom: indigo
colorTo: yellow
python_version: 3.1
sdk: gradio
sdk_version: 5.25.2
app_file: app.py
pinned: true
license: agpl-3.0
short_description: Use AI to find obvious research links in unexpected places.
datasets:
- nomadicsynth/arxiv-dataset-abstract-embeddings
models:
- nomadicsynth/research-compass-arxiv-abstracts-embedding-model
---

# Inkling: AI-assisted research discovery

![Inkling](https://huggingface.co/spaces/nomadicsynth/inkling/resolve/main/inkling-logo.png)

[**Inkling**](https://nomadicsynth-research-compass.hf.space) is an AI-assisted tool that helps you discover meaningful connections between research papers β€” the kind of links a domain expert might spot, if they had time to read everything.

Rather than relying on superficial similarity or shared keywords, Inkling is trained to recognize **reasoning-based relationships** between papers. It evaluates conceptual, methodological, and application-level connections β€” even across disciplines β€” and surfaces links that may be overlooked due to the sheer scale of the research landscape.

This demo uses the first prototype of the model, trained on a dataset of **10,000+ rated abstract pairs**, built from a larger pool of arXiv triplets. The system will continue to improve with feedback and will be released alongside the dataset for public research.

---

## What it does

- Accepts a research abstract, idea, or question
- Searches for papers with **deep, contextual relevance**
- Highlights key conceptual links and application overlaps
- Offers reasoning-based analysis between selected papers
- Gathers user feedback to improve the model over time

---

## Background and Motivation

Scientific progress often depends on connecting ideas across papers, fields, and years of literature. But with the volume of research growing exponentially, it's increasingly difficult for any one person β€” or even a team β€” to stay on top of it all. As a result, valuable connections between papers often go unnoticed simply because the right expert never read both.

In 2024, Luo et al. published a landmark study in *Nature Human Behaviour* showing that **large language models (LLMs) can outperform human experts** in predicting the results of neuroscience experiments by integrating knowledge across the scientific literature. Their model, **BrainGPT**, demonstrated how tuning a general-purpose LLM (like Mistral-7B) on domain-specific data could synthesize insights that surpass human forecasting ability. Notably, the authors found that models as small as 7B parameters performed well β€” an insight that influenced the foundation for this project.

Inspired by this work β€” and a YouTube breakdown by physicist and science communicator **Sabine Hossenfelder**, titled *["AIs Predict Research Results Without Doing Research"](https://www.youtube.com/watch?v=Qgrl3JSWWDE)* β€” this project began as an attempt to explore similar methods of knowledge integration at the level of paper-pair relationships. Her clear explanation and commentary sparked the idea to apply this paradigm not just to forecasting outcomes, but to identifying latent connections between published studies.

Originally conceived as a perplexity-ranking experiment using LLMs directly (mirroring Luo et al.'s evaluation method), the project gradually evolved into what it is now β€” **Inkling**, a reasoning-aware embedding model fine-tuned on LLM-rated abstract pairings, and built to help researchers uncover links that would be obvious β€” *if only someone had the time to read everything*.

---

## Why Inkling?

> Because the right connection is often obvious β€” once someone points it out.

Researchers today are overwhelmed by volume. Inkling helps restore those missed-but-meaningful links between ideas, methods, and fields β€” links that could inspire new directions, clarify existing work, or enable cross-pollination across domains.

---

## Citation

> Luo, X., Rechardt, A., Sun, G. et al. Large language models surpass human experts in predicting neuroscience results. *Nat Hum Behav* **9**, 305–315 (2025). [https://www.nature.com/articles/s41562-024-02046-9](https://www.nature.com/articles/s41562-024-02046-9)

---

## Status

Inkling is in **alpha** and under active development. The current model is hosted via Gradio, with a Hugging Face Space available for live interaction and feedback. Contributions, feedback, and collaboration are welcome.