Spaces:
Running
on
Zero
Running
on
Zero
# | |
<p align="center"> | |
<h1 align="center"> <ins>RIPE</ins>:<br> Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction <br><br>ππΊ ICCV 2025 πΊπ</h1> | |
<p align="center"> | |
<a href="https://scholar.google.com/citations?user=ybMR38kAAAAJ">Johannes KΓΌnzel</a> | |
Β· | |
<a href="https://scholar.google.com/citations?user=5yTuyGIAAAAJ">Anna Hilsmann</a> | |
Β· | |
<a href="https://scholar.google.com/citations?user=BCElyCkAAAAJ">Peter Eisert</a> | |
</p> | |
<h2 align="center"><p> | |
<a href="https://arxiv.org/abs/2507.04839" align="center">Arxiv</a> | | |
<a href="https://fraunhoferhhi.github.io/RIPE/" align="center">Project Page</a> | | |
<a href="https://huggingface.co/spaces/JohannesK14/RIPE" align="center">π€Demoπ€</a> | |
</p></h2> | |
<div align="center"></div> | |
</p> | |
<br/> | |
<p align="center"> | |
<img src="assets/teaser_image.png" alt="example" width=80%> | |
<br> | |
<em>RIPE demonstrates that keypoint detection and description can be learned from image pairs only - no depth, no pose, no artificial augmentation required.</em> | |
</p> | |
## Setup | |
π‘**Alternative**π‘ Install nothing locally and try our Hugging Face demo: [π€Demoπ€](https://huggingface.co/spaces/JohannesK14/RIPE) | |
1. Install mamba by following the instructions given here: [Mamba Installation](https://mamba.readthedocs.io/en/latest/installation/mamba-installation.html) | |
2. Create a new environment with: | |
```bash | |
mamba create -f conda_env.yml | |
mamba activate ripe-env | |
``` | |
## How to use | |
Or just check [demo.py](demo.py) | |
```python | |
import cv2 | |
import kornia.feature as KF | |
import kornia.geometry as KG | |
import matplotlib.pyplot as plt | |
import numpy as np | |
import torch | |
from torchvision.io import decode_image | |
from ripe import vgg_hyper | |
from ripe.utils.utils import cv2_matches_from_kornia, resize_image, to_cv_kpts | |
dev = torch.device("cuda" if torch.cuda.is_available() else "cpu") | |
model = vgg_hyper().to(dev) | |
model.eval() | |
image1 = resize_image(decode_image("assets/all_souls_000013.jpg").float().to(dev) / 255.0) | |
image2 = resize_image(decode_image("assets/all_souls_000055.jpg").float().to(dev) / 255.0) | |
kpts_1, desc_1, score_1 = model.detectAndCompute(image1, threshold=0.5, top_k=2048) | |
kpts_2, desc_2, score_2 = model.detectAndCompute(image2, threshold=0.5, top_k=2048) | |
matcher = KF.DescriptorMatcher("mnn") # threshold is not used with mnn | |
match_dists, match_idxs = matcher(desc_1, desc_2) | |
matched_pts_1 = kpts_1[match_idxs[:, 0]] | |
matched_pts_2 = kpts_2[match_idxs[:, 1]] | |
H, mask = KG.ransac.RANSAC(model_type="fundamental", inl_th=1.0)(matched_pts_1, matched_pts_2) | |
matchesMask = mask.int().ravel().tolist() | |
result_ransac = cv2.drawMatches( | |
(image1.cpu().permute(1, 2, 0).numpy() * 255.0).astype(np.uint8), | |
to_cv_kpts(kpts_1, score_1), | |
(image2.cpu().permute(1, 2, 0).numpy() * 255.0).astype(np.uint8), | |
to_cv_kpts(kpts_2, score_2), | |
cv2_matches_from_kornia(match_dists, match_idxs), | |
None, | |
matchColor=(0, 255, 0), | |
matchesMask=matchesMask, | |
# matchesMask=None, # without RANSAC filtering | |
singlePointColor=(0, 0, 255), | |
flags=cv2.DrawMatchesFlags_DEFAULT, | |
) | |
plt.imshow(result_ransac) | |
plt.axis("off") | |
plt.tight_layout() | |
plt.show() | |
# plt.savefig("result_ransac.png") | |
``` | |
## Reproduce the results | |
### MegaDepth 1500 & HPatches | |
1. Download and install [Glue Factory](https://github.com/cvg/glue-factory) | |
2. Add this repo as a submodule to Glue Factory: | |
```bash | |
cd glue-factory | |
git submodule add https://github.com/fraunhoferhhi/RIPE.git thirdparty/ripe | |
``` | |
3. Create the new file ripe.py under gluefactory/models/extractors/ with the following content: | |
<details> | |
<summary>ripe.py</summary> | |
```python | |
import sys | |
from pathlib import Path | |
import torch | |
import torchvision.transforms as transforms | |
from ..base_model import BaseModel | |
ripe_path = Path(__file__).parent / "../../../thirdparty/ripe" | |
print(f"RIPE Path: {ripe_path.resolve()}") | |
# check if the path exists | |
if not ripe_path.exists(): | |
raise RuntimeError(f"RIPE path not found: {ripe_path}") | |
sys.path.append(str(ripe_path)) | |
from ripe import vgg_hyper | |
class RIPE(BaseModel): | |
default_conf = { | |
"name": "RIPE", | |
"model_path": None, | |
"chunk": 4, | |
"dense_outputs": False, | |
"threshold": 1.0, | |
"top_k": 2048, | |
} | |
required_data_keys = ["image"] | |
# Initialize the line matcher | |
def _init(self, conf): | |
self.normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) | |
self.model = vgg_hyper(model_path=conf.model_path) | |
self.model.eval() | |
self.set_initialized() | |
def _forward(self, data): | |
image = data["image"] | |
keypoints, scores, descriptors = [], [], [] | |
chunk = self.conf.chunk | |
for i in range(0, image.shape[0], chunk): | |
if self.conf.dense_outputs: | |
raise NotImplementedError("Dense outputs are not supported") | |
else: | |
im = image[: min(image.shape[0], i + chunk)] | |
im = self.normalizer(im) | |
H, W = im.shape[-2:] | |
kpt, desc, score = self.model.detectAndCompute( | |
im, | |
threshold=self.conf.threshold, | |
top_k=self.conf.top_k, | |
) | |
keypoints += [kpt.squeeze(0)] | |
scores += [score.squeeze(0)] | |
descriptors += [desc.squeeze(0)] | |
del kpt | |
del desc | |
del score | |
keypoints = torch.stack(keypoints, 0) | |
scores = torch.stack(scores, 0) | |
descriptors = torch.stack(descriptors, 0) | |
pred = { | |
# "keypoints": keypoints.to(image) + 0.5, | |
"keypoints": keypoints.to(image), | |
"keypoint_scores": scores.to(image), | |
"descriptors": descriptors.to(image), | |
} | |
return pred | |
def loss(self, pred, data): | |
raise NotImplementedError | |
``` | |
</details> | |
4. Create ripe+NN.yaml in gluefactory/configs with the following content: | |
<details> | |
<summary>ripe+NN.yaml</summary> | |
```yaml | |
model: | |
name: two_view_pipeline | |
extractor: | |
name: extractors.ripe | |
threshold: 1.0 | |
top_k: 2048 | |
matcher: | |
name: matchers.nearest_neighbor_matcher | |
benchmarks: | |
megadepth1500: | |
data: | |
preprocessing: | |
side: long | |
resize: 1600 | |
eval: | |
estimator: poselib | |
ransac_th: 0.5 | |
hpatches: | |
eval: | |
estimator: poselib | |
ransac_th: 0.5 | |
model: | |
extractor: | |
top_k: 1024 # overwrite config above | |
``` | |
5. Run the MegaDepth 1500 evaluation script: | |
```bash | |
python -m gluefactory.eval.megadepth1500 --conf ripe+NN # for MegaDepth 1500 | |
``` | |
Should result in: | |
```bash | |
'rel_pose_error@10Β°': 0.6834, | |
'rel_pose_error@20Β°': 0.7803, | |
'rel_pose_error@5Β°': 0.5511, | |
``` | |
6. Run the HPatches evaluation script: | |
```bash | |
python -m gluefactory.eval.hpatches --conf ripe+NN # for HPatches | |
``` | |
Should result in: | |
```bash | |
'H_error_ransac@1px': 0.3793, | |
'H_error_ransac@3px': 0.5893, | |
'H_error_ransac@5px': 0.692, | |
``` | |
## Training | |
1. Create a .env file with the following content: | |
```bash | |
OUTPUT_DIR="/output" | |
DATA_DIR="/data" | |
``` | |
2. Download the required datasets: | |
<details> | |
<summary>DISK Megadepth subset</summary> | |
To download the dataset used by [DISK](https://github.com/cvlab-epfl/disk) execute the following commands: | |
```bash | |
cd data | |
bash download_disk_data.sh | |
``` | |
</details> | |
<details> | |
<summary>Tokyo 24/7</summary> | |
- β οΈ**Optional**β οΈ: Only if you are interest in the model used in Section 4.6 of the paper! | |
- Download the Tokyo 24/7 query images from here: [Tokyo 24/7 Query Images V3](http://www.ok.ctrl.titech.ac.jp/~torii/project/247/download/247query_v3.zip) from the official [website](http://www.ok.ctrl.titech.ac.jp/~torii/project/247/_). | |
- extract them into data/Tolyo_Query_V3 | |
```bash | |
Tokyo_Query_V3/ | |
βββ 00001.csv | |
βββ 00001.jpg | |
βββ 00002.csv | |
βββ 00002.jpg | |
βββ ... | |
βββ 01125.csv | |
βββ 01125.jpg | |
βββ Readme.txt | |
βββ Readme.txt~ | |
``` | |
</details> | |
<details> | |
<summary>ACDC</summary> | |
- β οΈ**Optional**β οΈ: Only if you are interest in the model used in Section 6.1 (supplementary) of the paper! | |
- Download the RGB images from here: [ACDC RGB Images](https://acdc.vision.ee.ethz.ch/rgb_anon_trainvaltest.zip) | |
- extract them into data/ACDC | |
```bash | |
ACDC/ | |
rgb_anon | |
βββ fog | |
βΒ Β βββ test | |
βΒ Β βΒ Β βββ GOPR0475 | |
βΒ Β βΒ Β βββ GOPR0477 | |
βΒ Β βββ test_ref | |
βΒ Β βΒ Β βββ GOPR0475 | |
βΒ Β βΒ Β βββ GOPR0477 | |
βΒ Β βββ train | |
βΒ Β βΒ Β βββ GOPR0475 | |
βΒ Β βΒ Β βββ GOPR0476 | |
βββ night | |
``` | |
</details> | |
3. Run the training script: | |
```bash | |
python ripe/train.py --config-name train project_name=train name=reproduce wandb_mode=offline | |
``` | |
You can also easily switch setting from the command line, e.g. to addionally train on the Tokyo 24/7 dataset: | |
```bash | |
python ripe/train.py --config-name train project_name=train name=reproduce wandb_mode=offline data=megadepth+tokyo | |
``` | |
## Acknowledgements | |
Our code is partly based on the following repositories: | |
- [DALF](https://github.com/verlab/DALF_CVPR_2023) Apache License 2.0 | |
- [DeDoDe](https://github.com/Parskatt/DeDoDe) MIT License | |
- [DISK](https://github.com/cvlab-epfl/disk) Apache License 2.0 | |
Our evaluation was based on the following repositories: | |
- [Glue Factory](https://github.com/cvg/glue-factory) | |
- [hloc](https://github.com/cvg/Hierarchical-Localization) | |
We would like to thank the authors of these repositories for their great work and for making their code available. | |
Our project webpage is based on the [Acadamic Project Page Template](https://github.com/eliahuhorwitz/Academic-project-page-template) by Eliahu Horwitz. | |
## BibTex Citation | |
``` | |
@article{ripe2025, | |
year = {2025}, | |
title = {{RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction}}, | |
author = {KΓΌnzel, Johannes and Hilsmann, Anna and Eisert, Peter}, | |
journal = {arXiv}, | |
eprint = {2507.04839}, | |
} | |
``` | |