Spaces:
Running
A newer version of the Gradio SDK is available:
5.37.0
RIPE:
Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction
ππΊ ICCV 2025 πΊπ
Johannes KΓΌnzel Β· Anna Hilsmann Β· Peter Eisert
Arxiv | Project Page | π€Demoπ€
RIPE demonstrates that keypoint detection and description can be learned from image pairs only - no depth, no pose, no artificial augmentation required.
Setup
π‘Alternativeπ‘ Install nothing locally and try our Hugging Face demo: π€Demoπ€
Install mamba by following the instructions given here: Mamba Installation
Create a new environment with:
mamba create -f conda_env.yml
mamba activate ripe-env
How to use
Or just check demo.py
import cv2
import kornia.feature as KF
import kornia.geometry as KG
import matplotlib.pyplot as plt
import numpy as np
import torch
from torchvision.io import decode_image
from ripe import vgg_hyper
from ripe.utils.utils import cv2_matches_from_kornia, resize_image, to_cv_kpts
dev = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = vgg_hyper().to(dev)
model.eval()
image1 = resize_image(decode_image("assets/all_souls_000013.jpg").float().to(dev) / 255.0)
image2 = resize_image(decode_image("assets/all_souls_000055.jpg").float().to(dev) / 255.0)
kpts_1, desc_1, score_1 = model.detectAndCompute(image1, threshold=0.5, top_k=2048)
kpts_2, desc_2, score_2 = model.detectAndCompute(image2, threshold=0.5, top_k=2048)
matcher = KF.DescriptorMatcher("mnn") # threshold is not used with mnn
match_dists, match_idxs = matcher(desc_1, desc_2)
matched_pts_1 = kpts_1[match_idxs[:, 0]]
matched_pts_2 = kpts_2[match_idxs[:, 1]]
H, mask = KG.ransac.RANSAC(model_type="fundamental", inl_th=1.0)(matched_pts_1, matched_pts_2)
matchesMask = mask.int().ravel().tolist()
result_ransac = cv2.drawMatches(
(image1.cpu().permute(1, 2, 0).numpy() * 255.0).astype(np.uint8),
to_cv_kpts(kpts_1, score_1),
(image2.cpu().permute(1, 2, 0).numpy() * 255.0).astype(np.uint8),
to_cv_kpts(kpts_2, score_2),
cv2_matches_from_kornia(match_dists, match_idxs),
None,
matchColor=(0, 255, 0),
matchesMask=matchesMask,
# matchesMask=None, # without RANSAC filtering
singlePointColor=(0, 0, 255),
flags=cv2.DrawMatchesFlags_DEFAULT,
)
plt.imshow(result_ransac)
plt.axis("off")
plt.tight_layout()
plt.show()
# plt.savefig("result_ransac.png")
Reproduce the results
MegaDepth 1500 & HPatches
- Download and install Glue Factory
- Add this repo as a submodule to Glue Factory:
cd glue-factory
git submodule add https://github.com/fraunhoferhhi/RIPE.git thirdparty/ripe
Create the new file ripe.py under gluefactory/models/extractors/ with the following content:
ripe.py
import sys from pathlib import Path import torch import torchvision.transforms as transforms from ..base_model import BaseModel ripe_path = Path(__file__).parent / "../../../thirdparty/ripe" print(f"RIPE Path: {ripe_path.resolve()}") # check if the path exists if not ripe_path.exists(): raise RuntimeError(f"RIPE path not found: {ripe_path}") sys.path.append(str(ripe_path)) from ripe import vgg_hyper class RIPE(BaseModel): default_conf = { "name": "RIPE", "model_path": None, "chunk": 4, "dense_outputs": False, "threshold": 1.0, "top_k": 2048, } required_data_keys = ["image"] # Initialize the line matcher def _init(self, conf): self.normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) self.model = vgg_hyper(model_path=conf.model_path) self.model.eval() self.set_initialized() def _forward(self, data): image = data["image"] keypoints, scores, descriptors = [], [], [] chunk = self.conf.chunk for i in range(0, image.shape[0], chunk): if self.conf.dense_outputs: raise NotImplementedError("Dense outputs are not supported") else: im = image[: min(image.shape[0], i + chunk)] im = self.normalizer(im) H, W = im.shape[-2:] kpt, desc, score = self.model.detectAndCompute( im, threshold=self.conf.threshold, top_k=self.conf.top_k, ) keypoints += [kpt.squeeze(0)] scores += [score.squeeze(0)] descriptors += [desc.squeeze(0)] del kpt del desc del score keypoints = torch.stack(keypoints, 0) scores = torch.stack(scores, 0) descriptors = torch.stack(descriptors, 0) pred = { # "keypoints": keypoints.to(image) + 0.5, "keypoints": keypoints.to(image), "keypoint_scores": scores.to(image), "descriptors": descriptors.to(image), } return pred def loss(self, pred, data): raise NotImplementedError
Create ripe+NN.yaml in gluefactory/configs with the following content:
ripe+NN.yaml
model: name: two_view_pipeline extractor: name: extractors.ripe threshold: 1.0 top_k: 2048 matcher: name: matchers.nearest_neighbor_matcher benchmarks: megadepth1500: data: preprocessing: side: long resize: 1600 eval: estimator: poselib ransac_th: 0.5 hpatches: eval: estimator: poselib ransac_th: 0.5 model: extractor: top_k: 1024 # overwrite config above
Run the MegaDepth 1500 evaluation script:
python -m gluefactory.eval.megadepth1500 --conf ripe+NN # for MegaDepth 1500
Should result in:
'rel_pose_error@10Β°': 0.6834,
'rel_pose_error@20Β°': 0.7803,
'rel_pose_error@5Β°': 0.5511,
- Run the HPatches evaluation script:
python -m gluefactory.eval.hpatches --conf ripe+NN # for HPatches
Should result in:
'H_error_ransac@1px': 0.3793,
'H_error_ransac@3px': 0.5893,
'H_error_ransac@5px': 0.692,
Training
- Create a .env file with the following content:
OUTPUT_DIR="/output"
DATA_DIR="/data"
Download the required datasets:
DISK Megadepth subset
To download the dataset used by DISK execute the following commands:
cd data bash download_disk_data.sh
Tokyo 24/7
- β οΈOptionalβ οΈ: Only if you are interest in the model used in Section 4.6 of the paper!
- Download the Tokyo 24/7 query images from here: Tokyo 24/7 Query Images V3 from the official website.
- extract them into data/Tolyo_Query_V3
Tokyo_Query_V3/ βββ 00001.csv βββ 00001.jpg βββ 00002.csv βββ 00002.jpg βββ ... βββ 01125.csv βββ 01125.jpg βββ Readme.txt βββ Readme.txt~
ACDC
- β οΈOptionalβ οΈ: Only if you are interest in the model used in Section 6.1 (supplementary) of the paper!
- Download the RGB images from here: ACDC RGB Images
- extract them into data/ACDC
ACDC/ rgb_anon βββ fog β βββ test β β βββ GOPR0475 β β βββ GOPR0477 β βββ test_ref β β βββ GOPR0475 β β βββ GOPR0477 β βββ train β β βββ GOPR0475 β β βββ GOPR0476 βββ night
Run the training script:
python ripe/train.py --config-name train project_name=train name=reproduce wandb_mode=offline
You can also easily switch setting from the command line, e.g. to addionally train on the Tokyo 24/7 dataset:
python ripe/train.py --config-name train project_name=train name=reproduce wandb_mode=offline data=megadepth+tokyo
Acknowledgements
Our code is partly based on the following repositories:
Our evaluation was based on the following repositories:
We would like to thank the authors of these repositories for their great work and for making their code available.
Our project webpage is based on the Acadamic Project Page Template by Eliahu Horwitz.
BibTex Citation
@article{ripe2025,
year = {2025},
title = {{RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction}},
author = {KΓΌnzel, Johannes and Hilsmann, Anna and Eisert, Peter},
journal = {arXiv},
eprint = {2507.04839},
}