Realcat's picture
add: ripe
e6ac593

A newer version of the Gradio SDK is available: 5.37.0

Upgrade

RIPE:
Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction

🌊🌺 ICCV 2025 🌺🌊

Johannes KΓΌnzel Β· Anna Hilsmann Β· Peter Eisert

Arxiv | Project Page | πŸ€—DemoπŸ€—


example
RIPE demonstrates that keypoint detection and description can be learned from image pairs only - no depth, no pose, no artificial augmentation required.

Setup

πŸ’‘AlternativeπŸ’‘ Install nothing locally and try our Hugging Face demo: πŸ€—DemoπŸ€—

  1. Install mamba by following the instructions given here: Mamba Installation

  2. Create a new environment with:

mamba create -f conda_env.yml
mamba activate ripe-env

How to use

Or just check demo.py

import cv2
import kornia.feature as KF
import kornia.geometry as KG
import matplotlib.pyplot as plt
import numpy as np
import torch
from torchvision.io import decode_image

from ripe import vgg_hyper
from ripe.utils.utils import cv2_matches_from_kornia, resize_image, to_cv_kpts

dev = torch.device("cuda" if torch.cuda.is_available() else "cpu")

model = vgg_hyper().to(dev)
model.eval()

image1 = resize_image(decode_image("assets/all_souls_000013.jpg").float().to(dev) / 255.0)
image2 = resize_image(decode_image("assets/all_souls_000055.jpg").float().to(dev) / 255.0)

kpts_1, desc_1, score_1 = model.detectAndCompute(image1, threshold=0.5, top_k=2048)
kpts_2, desc_2, score_2 = model.detectAndCompute(image2, threshold=0.5, top_k=2048)

matcher = KF.DescriptorMatcher("mnn")  # threshold is not used with mnn
match_dists, match_idxs = matcher(desc_1, desc_2)

matched_pts_1 = kpts_1[match_idxs[:, 0]]
matched_pts_2 = kpts_2[match_idxs[:, 1]]

H, mask = KG.ransac.RANSAC(model_type="fundamental", inl_th=1.0)(matched_pts_1, matched_pts_2)
matchesMask = mask.int().ravel().tolist()

result_ransac = cv2.drawMatches(
    (image1.cpu().permute(1, 2, 0).numpy() * 255.0).astype(np.uint8),
    to_cv_kpts(kpts_1, score_1),
    (image2.cpu().permute(1, 2, 0).numpy() * 255.0).astype(np.uint8),
    to_cv_kpts(kpts_2, score_2),
    cv2_matches_from_kornia(match_dists, match_idxs),
    None,
    matchColor=(0, 255, 0),
    matchesMask=matchesMask,
    # matchesMask=None, # without RANSAC filtering
    singlePointColor=(0, 0, 255),
    flags=cv2.DrawMatchesFlags_DEFAULT,
)

plt.imshow(result_ransac)
plt.axis("off")
plt.tight_layout()

plt.show()
# plt.savefig("result_ransac.png")

Reproduce the results

MegaDepth 1500 & HPatches

  1. Download and install Glue Factory
  2. Add this repo as a submodule to Glue Factory:
cd glue-factory
git submodule add https://github.com/fraunhoferhhi/RIPE.git thirdparty/ripe
  1. Create the new file ripe.py under gluefactory/models/extractors/ with the following content:

    ripe.py
    import sys
    from pathlib import Path
    
    import torch
    import torchvision.transforms as transforms
    
    from ..base_model import BaseModel
    
    ripe_path = Path(__file__).parent / "../../../thirdparty/ripe"
    
    print(f"RIPE Path: {ripe_path.resolve()}")
    # check if the path exists
    if not ripe_path.exists():
        raise RuntimeError(f"RIPE path not found: {ripe_path}")
    
    sys.path.append(str(ripe_path))
    
    from ripe import vgg_hyper
    
    
    class RIPE(BaseModel):
        default_conf = {
            "name": "RIPE",
            "model_path": None,
            "chunk": 4,
            "dense_outputs": False,
            "threshold": 1.0,
            "top_k": 2048,
        }
    
        required_data_keys = ["image"]
    
        # Initialize the line matcher
        def _init(self, conf):
            self.normalizer = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
            self.model = vgg_hyper(model_path=conf.model_path)
            self.model.eval()
    
            self.set_initialized()
    
        def _forward(self, data):
            image = data["image"]
    
            keypoints, scores, descriptors = [], [], []
    
            chunk = self.conf.chunk
    
            for i in range(0, image.shape[0], chunk):
                if self.conf.dense_outputs:
                    raise NotImplementedError("Dense outputs are not supported")
                else:
                    im = image[: min(image.shape[0], i + chunk)]
                    im = self.normalizer(im)
    
                    H, W = im.shape[-2:]
    
                    kpt, desc, score = self.model.detectAndCompute(
                        im,
                        threshold=self.conf.threshold,
                        top_k=self.conf.top_k,
                    )
                keypoints += [kpt.squeeze(0)]
                scores += [score.squeeze(0)]
                descriptors += [desc.squeeze(0)]
    
                del kpt
                del desc
                del score
    
            keypoints = torch.stack(keypoints, 0)
            scores = torch.stack(scores, 0)
            descriptors = torch.stack(descriptors, 0)
    
            pred = {
                # "keypoints": keypoints.to(image) + 0.5,
                "keypoints": keypoints.to(image),
                "keypoint_scores": scores.to(image),
                "descriptors": descriptors.to(image),
            }
    
            return pred
    
        def loss(self, pred, data):
            raise NotImplementedError
    
  2. Create ripe+NN.yaml in gluefactory/configs with the following content:

    ripe+NN.yaml
    model:
        name: two_view_pipeline
        extractor:
            name: extractors.ripe
            threshold: 1.0
            top_k: 2048
        matcher:
            name: matchers.nearest_neighbor_matcher
    benchmarks:
        megadepth1500:
          data:
            preprocessing:
              side: long
              resize: 1600
          eval:
            estimator: poselib
            ransac_th: 0.5
        hpatches:
          eval:
            estimator: poselib
            ransac_th: 0.5
          model:
            extractor:
              top_k: 1024  # overwrite config above
    
  3. Run the MegaDepth 1500 evaluation script:

python -m gluefactory.eval.megadepth1500 --conf ripe+NN # for MegaDepth 1500

Should result in:

'rel_pose_error@10Β°': 0.6834,
'rel_pose_error@20Β°': 0.7803,
'rel_pose_error@5Β°': 0.5511,
  1. Run the HPatches evaluation script:
python -m gluefactory.eval.hpatches --conf ripe+NN # for HPatches

Should result in:

'H_error_ransac@1px': 0.3793,
'H_error_ransac@3px': 0.5893,
'H_error_ransac@5px': 0.692,

Training

  1. Create a .env file with the following content:
OUTPUT_DIR="/output"
DATA_DIR="/data"
  1. Download the required datasets:

    DISK Megadepth subset

    To download the dataset used by DISK execute the following commands:

    cd data
    bash download_disk_data.sh
    
    Tokyo 24/7
    • ⚠️Optional⚠️: Only if you are interest in the model used in Section 4.6 of the paper!
    • Download the Tokyo 24/7 query images from here: Tokyo 24/7 Query Images V3 from the official website.
    • extract them into data/Tolyo_Query_V3
    Tokyo_Query_V3/
    β”œβ”€β”€ 00001.csv
    β”œβ”€β”€ 00001.jpg
    β”œβ”€β”€ 00002.csv
    β”œβ”€β”€ 00002.jpg
    β”œβ”€β”€ ...
    β”œβ”€β”€ 01125.csv
    β”œβ”€β”€ 01125.jpg
    β”œβ”€β”€ Readme.txt
    └── Readme.txt~
    
    ACDC
    • ⚠️Optional⚠️: Only if you are interest in the model used in Section 6.1 (supplementary) of the paper!
    • Download the RGB images from here: ACDC RGB Images
    • extract them into data/ACDC
    ACDC/
    rgb_anon
    β”œβ”€β”€ fog
    β”‚   β”œβ”€β”€ test
    β”‚   β”‚   β”œβ”€β”€ GOPR0475
    β”‚   β”‚   β”œβ”€β”€ GOPR0477
    β”‚   β”œβ”€β”€ test_ref
    β”‚   β”‚   β”œβ”€β”€ GOPR0475
    β”‚   β”‚   β”œβ”€β”€ GOPR0477
    β”‚   β”œβ”€β”€ train
    β”‚   β”‚   β”œβ”€β”€ GOPR0475
    β”‚   β”‚   β”œβ”€β”€ GOPR0476
    β”œβ”€β”€ night
    
  2. Run the training script:

python ripe/train.py --config-name train project_name=train name=reproduce wandb_mode=offline

You can also easily switch setting from the command line, e.g. to addionally train on the Tokyo 24/7 dataset:

python ripe/train.py --config-name train project_name=train name=reproduce wandb_mode=offline data=megadepth+tokyo

Acknowledgements

Our code is partly based on the following repositories:

Our evaluation was based on the following repositories:

We would like to thank the authors of these repositories for their great work and for making their code available.

Our project webpage is based on the Acadamic Project Page Template by Eliahu Horwitz.

BibTex Citation


@article{ripe2025, 
year = {2025}, 
title = {{RIPE: Reinforcement Learning on Unlabeled Image Pairs for Robust Keypoint Extraction}}, 
author = {KΓΌnzel, Johannes and Hilsmann, Anna and Eisert, Peter}, 
journal = {arXiv}, 
eprint = {2507.04839}, 
}