SISAP 2026 Indexing Challenge: Task description and participation details

📢 Task 3 Datasets Now Available

The dataset for Task 3 (NQ) is now available. We also provide a smaller FIQA dataset to help you get started with development.

Please check the Task 3 section for more details.

The SISAP Indexing Challenge 2026 invites researchers and practitioners to participate in exciting tasks to advance the state of the art in similarity search and indexing. The challenge provides a platform for presenting innovative solutions and pushing the boundaries of efficiency and effectiveness in large-scale similarity search indexes. This year, we are proposing three challenging tasks.

Datasets are available at https://huggingface.co/datasets/sisap-challenges/SISAP2026/tree/main; you can clone the full repository or download each file separately.

If you think about participating in the challenge, please fill out a pre-registration at https://github.com/sisap-challenges/challenge2026/.

Task 1: K-nearest neighbor graph (a.k.a. metric self-join)

In this task, participants are asked to develop memory-efficient indexing solutions that will be used to compute an approximation of the k-nearest neighbor graph for k=15. Each solution will be run in a Linux container with limited memory and storage resources.

Task 2: Maximum Inner Product Search on LLM attention workloads (Search under Distribution Shift)

In this task, participants are asked to develop memory-efficient indexing solutions to solve maximum inner product search queries in an LLM-inspired workload. Each solution will be run in a Linux container with limited memory and storage resources.

Task 3: Indexing very sparse high-dimensional vectors

Learned sparse models bridge traditional inverted indexing and neural retrieval. However, their high dimensionality and learned term distributions challenge classical IR data structures.

This task investigates how to design scalable, memory-efficient indexing methods for such representations under realistic hardware constraints. In this task, participants are asked to develop memory-efficient indexing solutions to solve information retrieval-inspired tasks on very high-dimensional, sparse embeddings using the SPLADE-v3 sparse encoder model.

Test Data, Queries, Number of Hyperparameters:

Additional datasets:

Result Submission Format

To ensure compatibility with the evaluation pipeline, results must be provided as HDF5 files following a specific structure and metadata format.

File Content: Each HDF5 file must contain two datasets:

Note: Matrices should follow row-major order (standard for C/Python/NumPy).

Metadata (Attributes): The HDF5 file must include the following attributes at the root level:

Directory Structure: Files should be organized in the following directory structure: results/<task_name>/<unique_filename>.h5

For example: results/task1/myalgo_M16_ef100.h5.

Docker Container and Evaluation

We are currently working on a reproducible evaluation framework for the SISAP challenge. You can expect that we will evaluate solutions with a container setup in which participants are expected to create a Docker container which we will run to evaluate their solutions. To enforce the system requirements of the challenge, the container can be executed with the following limits:

docker run \
    -it \
    --cpus=8 \
    --memory=24g \
    --memory-swap=24g \
    --memory-swappiness 0 \
    --volume $(pwd)/data:/app/data:ro \
    --volume $(pwd)/results:/app/results:rw \
    sisap-baseline --task task3 --dataset fiqa-dev

Hardware specifications

Details of the evaluation machine will soon be available.

Registration and Participation

  1. To facilitate running the challenge, please register for the challenge by opening a "Pre-registration request" issue in the GitHub repository https://github.com/sisap-challenges/challenge2026/. Fill out the required data, taking into account that the given data will be used to keep in contact while the challenge remains open. We use this system to keep track of potential participants; for later registration, contact the organizers first.

  2. During the development phase, participants will have access to gold standards for all tasks.

  3. Teams are required to provide public GitHub repositories with working GitHub Actions and clear instructions on how to run their solutions with the correct hyperparameters (up to 15 sets) for each task. You can use a small dataset like the SISAP2025’s CCNEWS. Submissions are required to run in Docker containers. Results have to be written in a standard format to unify the evaluation. Examples will be released soon. Please visit the challenge website for updates.

  4. Participants' repositories will be cloned and tested at the time of the challenge. Results will be shared with the authors for verification and potential fixes before the final rankings are published. The short paper that is to be submitted following an entry will be submitted before the final rankings are published and should thus focus on a self-evaluation of the proposed system.

  5. The private workloads that are used in the evaluation are shared publicly after the evaluation has been carried out.

  6. One person can only be part of a single team.

Paper Submissions

All participants should submit one short paper that details their system. If participants solve multiple tasks, the system must be described in a single paper (that might reference a technical report). Accepted papers will be part of the conference proceedings and part of a special session at SISAP 2026. Each accepted paper is required to be presented in person as an oral presentation at that session. Submissions that are not accompanied by an accepted short paper will be disqualified and removed from the final rankings.

We look forward to your participation and innovative solutions in the SISAP Indexing Challenge 2026! Let's push the frontiers of similarity search and indexing together.

Examples

Both examples are work in progress.

Final comments

Any transformation of the dataset to load, index, and solve nearest neighbor queries is allowed. Transformations include but are not limited to packing into different data types, dimensional reduction, locality-sensitive hashing, product quantization, and transformation into binary sketches. Reproducibility and open science are primary goals of the challenge, so we accept only public GitHub repositories with working GitHub Actions as submissions. Indexing algorithms may already be published or original contributions, but a dedicated effort towards solving the respective tasks must be visible in the submission.

Important Dates (all 2026)

Organization Committee

Write an email to sisap-2026-indexing-challenge@googlegroups.com to contact any of the organizers.

CC BY-SA 4.0 sisap challenge committee. Last modified: March 10, 2026. Website built with Franklin.jl and the Julia programming language.