RIPE:
Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction

🌊🌺 ICCV 2025 🌺🌊

Johannes Künzel · Anna Hilsmann · Peter Eisert

Arxiv | Project Page | 🤗Demo🤗

example
RIPE demonstrates that keypoint detection and description can be learned from image pairs only - no depth, no pose, no artificial augmentation required.

Setup

💡Alternative💡 Install nothing locally and try our Hugging Face demo: 🤗Demo🤗

Install mamba by following the instructions given here: Mamba Installation
Create a new environment with:

mamba create -f conda_env.yml
mamba activate ripe-env

How to use

Or just check demo.py

import cv2
import kornia.feature as KF
import kornia.geometry as KG
import matplotlib.pyplot as plt
import numpy as np
import torch
from torchvision.io import decode_image

from ripe import vgg_hyper
from ripe.utils.utils import cv2_matches_from_kornia, resize_image, to_cv_kpts

dev = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = vgg_hyper().to(dev)
model.eval()

image1 = resize_image(decode_image("assets/all_souls_000013.jpg").float().to(dev) / 255.0)
image2 = resize_image(decode_image("assets/all_souls_000055.jpg").float().to(dev) / 255.0)

kpts_1, desc_1, score_1 = model.detectAndCompute(image1, threshold=0.5, top_k=2048)
kpts_2, desc_2, score_2 = model.detectAndCompute(image2, threshold=0.5, top_k=2048)

matcher = KF.DescriptorMatcher("mnn")  # threshold is not used with mnn
match_dists, match_idxs = matcher(desc_1, desc_2)

matched_pts_1 = kpts_1[match_idxs[:, 0]]
matched_pts_2 = kpts_2[match_idxs[:, 1]]

H, mask = KG.ransac.RANSAC(model_type="fundamental", inl_th=1.0)(matched_pts_1, matched_pts_2)
matchesMask = mask.int().ravel().tolist()

result_ransac = cv2.drawMatches(
    (image1.cpu().permute(1, 2, 0).numpy() * 255.0).astype(np.uint8),
    to_cv_kpts(kpts_1, score_1),
    (image2.cpu().permute(1, 2, 0).numpy() * 255.0).astype(np.uint8),
    to_cv_kpts(kpts_2, score_2),
    cv2_matches_from_kornia(match_dists, match_idxs),
    None,
    matchColor=(0, 255, 0),
    matchesMask=matchesMask,
    # matchesMask=None, # without RANSAC filtering
    singlePointColor=(0, 0, 255),
    flags=cv2.DrawMatchesFlags_DEFAULT,
)

plt.imshow(result_ransac)
plt.axis("off")
plt.tight_layout()

plt.show()
# plt.savefig("result_ransac.png")

Reproduce the results

MegaDepth 1500 & HPatches

Download and install Glue Factory
Add this repo as a submodule to Glue Factory:

cd glue-factory
git submodule add https://github.com/fraunhoferhhi/RIPE.git thirdparty/ripe

Create the new file ripe.py under gluefactory/models/extractors/ with the following content:

ripe.py

import sys
from pathlib import Path

import torch
import torchvision.transforms as transforms

from ..base_model import BaseModel

ripe_path = Path(__file__).parent / "../../../thirdparty/ripe"

print(f"RIPE Path: {ripe_path.resolve()}")
# check if the path exists
if not ripe_path.exists():
    raise RuntimeError(f"RIPE path not found: {ripe_path}")

sys.path.append(str(ripe_path))

from ripe import vgg_hyper


class RIPE(BaseModel):
    default_conf = {
        "name": "RIPE",
        "model_path": None,
        "chunk": 4,
        "dense_outputs": False,
        "threshold": 1.0,
        "top_k": 2048,
    }

    required_data_keys = ["image"]

    # Initialize the line matcher
    def _init(self, conf):
        self.normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        self.model = vgg_hyper(model_path=conf.model_path)
        self.model.eval()

        self.set_initialized()

    def _forward(self, data):
        image = data["image"]

        keypoints, scores, descriptors = [], [], []

        chunk = self.conf.chunk

        for i in range(0, image.shape[0], chunk):
            if self.conf.dense_outputs:
                raise NotImplementedError("Dense outputs are not supported")
            else:
                im = image[: min(image.shape[0], i + chunk)]
                im = self.normalizer(im)

                H, W = im.shape[-2:]

                kpt, desc, score = self.model.detectAndCompute(
                    im,
                    threshold=self.conf.threshold,
                    top_k=self.conf.top_k,
                )
            keypoints += [kpt.squeeze(0)]
            scores += [score.squeeze(0)]
            descriptors += [desc.squeeze(0)]

            del kpt
            del desc
            del score

        keypoints = torch.stack(keypoints, 0)
        scores = torch.stack(scores, 0)
        descriptors = torch.stack(descriptors, 0)

        pred = {
            # "keypoints": keypoints.to(image) + 0.5,
            "keypoints": keypoints.to(image),
            "keypoint_scores": scores.to(image),
            "descriptors": descriptors.to(image),
        }

        return pred

    def loss(self, pred, data):
        raise NotImplementedError

Create ripe+NN.yaml in gluefactory/configs with the following content:

ripe+NN.yaml

model:
    name: two_view_pipeline
    extractor:
        name: extractors.ripe
        threshold: 1.0
        top_k: 2048
    matcher:
        name: matchers.nearest_neighbor_matcher
benchmarks:
    megadepth1500:
      data:
        preprocessing:
          side: long
          resize: 1600
      eval:
        estimator: poselib
        ransac_th: 0.5
    hpatches:
      eval:
        estimator: poselib
        ransac_th: 0.5
      model:
        extractor:
          top_k: 1024  # overwrite config above

Run the MegaDepth 1500 evaluation script:

python -m gluefactory.eval.megadepth1500 --conf ripe+NN # for MegaDepth 1500

Should result in:

'rel_pose_error@10°': 0.6834,
'rel_pose_error@20°': 0.7803,
'rel_pose_error@5°': 0.5511,

Run the HPatches evaluation script:

python -m gluefactory.eval.hpatches --conf ripe+NN # for HPatches

Should result in:

'H_error_ransac@1px': 0.3793,
'H_error_ransac@3px': 0.5893,
'H_error_ransac@5px': 0.692,

Training

Create a .env file with the following content:

OUTPUT_DIR="/output"
DATA_DIR="/data"

Download the required datasets:

DISK Megadepth subset

To download the dataset used by DISK execute the following commands:

cd data
bash download_disk_data.sh

Tokyo 24/7

⚠️Optional⚠️: Only if you are interest in the model used in Section 4.6 of the paper!
Download the Tokyo 24/7 query images from here: Tokyo 24/7 Query Images V3 from the official website.
extract them into data/Tolyo_Query_V3

Tokyo_Query_V3/
├── 00001.csv
├── 00001.jpg
├── 00002.csv
├── 00002.jpg
├── ...
├── 01125.csv
├── 01125.jpg
├── Readme.txt
└── Readme.txt~

ACDC

⚠️Optional⚠️: Only if you are interest in the model used in Section 6.1 (supplementary) of the paper!
Download the RGB images from here: ACDC RGB Images
extract them into data/ACDC

ACDC/
rgb_anon
├── fog
│   ├── test
│   │   ├── GOPR0475
│   │   ├── GOPR0477
│   ├── test_ref
│   │   ├── GOPR0475
│   │   ├── GOPR0477
│   ├── train
│   │   ├── GOPR0475
│   │   ├── GOPR0476
├── night

Run the training script:

python ripe/train.py --config-name train project_name=train name=reproduce wandb_mode=offline

You can also easily switch setting from the command line, e.g. to addionally train on the Tokyo 24/7 dataset:

python ripe/train.py --config-name train project_name=train name=reproduce wandb_mode=offline data=megadepth+tokyo

Acknowledgements

Our code is partly based on the following repositories:

DALF Apache License 2.0
DeDoDe MIT License
DISK Apache License 2.0

Our evaluation was based on the following repositories:

We would like to thank the authors of these repositories for their great work and for making their code available.

Our project webpage is based on the Acadamic Project Page Template by Eliahu Horwitz.

BibTex Citation


@article{ripe2025, 
year = {2025}, 
title = {{RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction}}, 
author = {Künzel, Johannes and Hilsmann, Anna and Eisert, Peter}, 
journal = {arXiv}, 
eprint = {2507.04839}, 
}

RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction 🌊🌺 ICCV 2025 🌺🌊