About file formats

We will find several kinds of files. For instance, LIAON parts come in two formats: .npz for embeddings[1] and .parquet files for metadata[2]. We use HDF5 files[3] (.h5) extensively for datasets, projections, queries, and gold standards, and we also require result files to be HDF5 files. HDF5 files can contain tree-like organization and may include several kinds of data, working well among platforms.

[1] Numpy files can be loaded with numpy.load in Python and with package NPZ.jl in Julia.
[2] The pyarrow package provide support for parquet files in Python. Julia users can use the Parquet2.jl package.
[3] High-performance data management and storage suite https://www.hdfgroup.org/solutions/hdf5/. In Python, these files can be loaded and created with h5py. Julia users can use HDF5.jl or JLD2.jl.

CC BY-SA 4.0 sisap challenge committee. Last modified: August 22, 2023. Website built with Franklin.jl and the Julia programming language.