Tasks of the Implementation Challenge and Demo Track

Task A: Indexing and searching a LAION-5B deep features subset

The LAION2B deep features dataset contains 768-dimensional vectors using 16-bit floating point numbers, using dot product as a similarity measure. This task asks for solutions that work on a subset of this dataset. Any transformation to the dataset to load, index, and solve kk nearest neighbor queries is allowed. Transformations include, but are not limited to, packing into different data types, dimensional reduction, locality-sensitive hashing, product quantization, and binary sketches. Indexing algorithms can be an original part of the contribution, or alternatively, already published indexes can be used. The entire pipeline should be included and tested in GitHub's Actions (GHA) to reproduce the results; the hyperparameters must be clearly specified on the GHA for all the reported subsets.

We will reproduce your results post-challenge and produce rankings on quality and search time using a different query set. So, it is essential that your solution can be replicated.

Please visit https://sisap-challenges.github.io/repoexamples/ for working examples. In particular, please follow the guidelines of https://github.com/sisap-challenges/sisap23-laion-challenge-evaluation to simplify evaluating your solution. Note that the available examples can be used as starting points.

Task B: Binary sketches

Search methods working on main memory could take advantage of fast RAM memory accesses instead of slower secondary memory accesses. Nevertheless, large high-dimensional datasets may limit what can be maintained in the main memory simultaneously, even on high-end hardware systems. Binary sketches under the Hamming distance are compact representations of objects, they have a small memory footprint, and distances can be computed efficiently using instructions like SIMD-popcount, available in almost any modern processor.

This task asks for methods that use a dataset and a distance function to project it into binary sketches where the Hamming distance can measure its dissimilarity. The goal is to provide fast methods to achieve high-quality binary mappings under the mentioned terms.

N- Note 1: It is well known other approaches like product quantization, locality-sensitive hashing, and dimensional reduction methods also aim at reducing the memory footprint. However, they are beyond the scope of Task B.

  • Note 2: There is a complementary task below.

Task C: Indexing and searching on binary sketches

This task asks for solutions for fast indexing binary sketches under the Hamming distance. Participants can use their solutions on Task B to compute the projections or a set of binary sketches added with a variant of the brief permutation binary sketches.[1]

[1] Santoyo, F., Chávez, E., & Téllez, E. S. (2014). A compressed index for hamming distances. In Similarity Search and Applications: 7th International Conference, SISAP 2014, Los Cabos, Mexico, October 29-31, 2014. Proceedings 7 (pp. 113-126). Springer International Publishing.

General notes

We will impose hard wall times (e.g., 12hrs) for construction and searching for verification and reproduction (final ranking). We expect that people already specify hyperparameters in the solutions for each subset and task.

CC BY-SA 4.0 sisap challenge committee. Last modified: August 22, 2023. Website built with Franklin.jl and the Julia programming language.