1 of 4

Local Index Directory

This page describes the Local Index Directory component in Boost, what it is used for, how it works and how to start using it

Local Index Directory is not yet released. This is a placeholder page for its documentation.

Background

The Local Index Directory (LID) manages and stores indices of deal data so that it can be retrieved by a content identifier (cid).

Currently this task is performed by the DAG store component. The DAG store keeps its indexes on disk on a single machine. LID replaces the DAG store and introduces a horizontally scalable backend database for storing the data - YugabyteDB.

LID is designed to provide a more intuitive experience for the user, by surfacing problems and providing various repair tools.

To summarize, LID is the component which keeps fine-grained metadata about all the deals on Filecoin that a given Storage Provider stores, and without it client would only be able to retrieve full pieces, which generally are between 8GiB and 32GiB in size.

Storing data on Filecoin

When a client uploads deal data to Boost, LID records the sector that the deal data is stored in and scans the deal data to create an index of all its blocks indexed by block cid. This way cilents can later retrieve subsets of the original deal data, without retrieving the full deal data.

Retrieving data

When a client makes a request for data by cid, LID: - checks which piece the cid is in, and where in the piece the data is - checks which sector the piece is in, and where in the sector the piece is - reads the data from the sector

Use cases

The retrieval use cases that the Local Index Directory supports are:

Graphsync retrieval

Request one root cid with a selector, receive many blocks

LID is able to: - look up which piece contains the root cid - look up which sector contains the piece - for each block, get the offset into the piece for the block

Bitswap retrieval

Request one block at a time

LID is able to: - look up which piece contains the block - get the size of the block (Bitswap asks for the size before getting the block data) - look up which sector contains the piece - get the offset into the piece for the block

HTTP retrieval

Request a whole piece

LID is able to look up which sector contains the piece.

Request an individual block

LID is able to: - look up which piece contains the block - look up which sector contains the piece - get the offset into the piece for the block

Request a file by root cid

LID is able to: - look up which piece contains the block - look up which sector contains the piece - for each block, get the offset into the piece for the block

Architecture

Local Index Directory architecture and index types

When designing the Local Index Directory we considered the needs of various Storage Providers (SPs) and the operational overhead LID would have on their systems. We built a solution for: - small- SPs - holding up to 1PiB), and - mid- and large- size SPs - holding anywhere from 1PiB, up to 100PiB data

Depending on underlying block size and data format, index size can vary in size. Typically block sizes are between 16KiB and 1MiB.

At the moment there are two implementations of LID: - a simple LevelDB implementation, for small SPs who want to keep all information in a single process database. - a scalable YugabyteDB implementation, for medium and large size SPs with tens of thousands of deals.

Index types

In order to support the described retrieval use cases, LID maintains the following indexes:

multihash → []piece cid

To look up which pieces contain a block

piece cid → sector information {sector ID, offset, size}

To look up which sector a piece is in

piece cid → map<mulithash → block offset / size>

To look up where in the piece a block is and the block’s size

Requirements

Local Index Directory requirements and dependencies

Dependencies

Local Index Directory depends on a backend database to store various indices. Currently we support two implementations - YugabyteDB or LevelDB - depending on the size of deal data and indices a storage provider holds.

LevelDB is an open source on-disk key-value store, and can be used when indices fit on a single host.

YugabyteDB is an open source modern distributed database designed to run in any public, private, hybrid or multi-cloud environment.

Storage providers who hold more than 1PiB data are encouraged to use YugabyteDB as it is horizontally scalable, provides better monitoring and management utilities and could support future growth.

Hardware requirements

For detailed instructions, playbooks and hardware recommendations, see the YugabyteDB website - https://docs.yugabyte.com

YugabyteDB is designed to run on bare-metal machines, virtual machines (VMs), and containers. CPU and RAM

You should allocate adequate CPU and RAM. YugabyteDB has adequate defaults for running on a wide range of machines, and has been tested from 2 core to 64 core machines, and up to 200GB RAM.

Minimum requirement

2 cores
2GB RAM

Production requirement

16+ cores
32GB+ RAM
Add more CPU (compared to adding more RAM) to improve performance.

Verify support for SSE2 and SSE4.2

YugabyteDB requires the SSE2 instruction set support, which was introduced into Intel chips with the Pentium 4 in 2001 and AMD processors in 2003. Most systems produced in the last several years are equipped with SSE2.

In addition, YugabyteDB requires SSE4.2.

To verify that your system supports SSE2, run the following command:

cat /proc/cpuinfo | grep sse2

To verify that your system supports SSE4.2, run the following command:

cat /proc/cpuinfo | grep sse4.2

Disks

SSDs (solid state disks) are required.

We recommend a minimum of 1TiB or more allocated for YugabyteDB, depending on the amount of deal data you store and its average block size.

Assuming you've kept unsealed copies of all your data and have consistently indexed deal data, the size of your DAG store directory should be comparable with the requirements for YugabyteDB

Initialisation

This page explains how to initialise LID and start using it to provide retrievals to clients

Considering that the Local Index Directory is a new feature, Storage Providers should initialise it after upgrading their Boost deployments.

There are two ways a Storage Provider can do that:

Migrate existing indices from the DAG store into LID: this solution assumes that the Storage Provider has been keeping an unsealed copy for every sector they prove on-chain, and has already indexed all their deal data into the DAG store. Typically index sizes for a given sector range between 100KiB up to 1GiB, depending on deal data and its blocks sizes. The DAG store keeps these indices in the repository directory of Boost under the ./dagstore/index and ./dagstore/datastore directories. This data should be migrated to LID with the migrate-lid utility.
Recreate indices for deal data based on unsealed copies of sectors: this solution assumes that the Storage Provider has unsealed copies for every sector they prove on-chain. If this is not the case, then the SP should first trigger an unseal (UNS) job on their system for every sector that contains user data and produce an unseal copy. SPs can use the boostd recover lid utility to produce an index for all deal data within an unsealed sector and store it in LID so that they enable retrievals for the data. Depending on SPs deployment and where unsealed copies are hosted (NFS, Ceph, external disks, etc.) and the performance of the hosting system, producing an index for a 32GiB sector can take anywhere from a few seconds up to a few minutes, as the unsealed copy needs to be processed by the utility.

Migrate existing indices from the DAG store into LID

TODO

Recreate indices for deal data based on unsealed copies of sectors

TODO

Requirements

Local Index Directory requirements and dependencies

Dependencies

LevelDB is an open source on-disk key-value store, and can be used when indices fit on a single host.

YugabyteDB is an open source modern distributed database designed to run in any public, private, hybrid or multi-cloud environment.

Storage providers who hold more than 1PiB data are encouraged to use YugabyteDB as it is horizontally scalable, provides better monitoring and management utilities and could support future growth.

Hardware requirements

For detailed instructions, playbooks and hardware recommendations, see the YugabyteDB website - https://docs.yugabyte.com

YugabyteDB is designed to run on bare-metal machines, virtual machines (VMs), and containers. CPU and RAM

You should allocate adequate CPU and RAM. YugabyteDB has adequate defaults for running on a wide range of machines, and has been tested from 2 core to 64 core machines, and up to 200GB RAM.

Minimum requirement

2 cores
2GB RAM

Production requirement

16+ cores
32GB+ RAM
Add more CPU (compared to adding more RAM) to improve performance.

Verify support for SSE2 and SSE4.2

In addition, YugabyteDB requires SSE4.2.

To verify that your system supports SSE2, run the following command:

cat /proc/cpuinfo | grep sse2

To verify that your system supports SSE4.2, run the following command:

cat /proc/cpuinfo | grep sse4.2

Disks

SSDs (solid state disks) are required.

We recommend a minimum of 1TiB or more allocated for YugabyteDB, depending on the amount of deal data you store and its average block size.

Assuming you've kept unsealed copies of all your data and have consistently indexed deal data, the size of your DAG store directory should be comparable with the requirements for YugabyteDB

Initialisation

This page explains how to initialise LID and start using it to provide retrievals to clients

Considering that the Local Index Directory is a new feature, Storage Providers should initialise it after upgrading their Boost deployments.

There are two ways a Storage Provider can do that:

Migrate existing indices from the DAG store into LID: this solution assumes that the Storage Provider has been keeping an unsealed copy for every sector they prove on-chain, and has already indexed all their deal data into the DAG store. Typically index sizes for a given sector range between 100KiB up to 1GiB, depending on deal data and its blocks sizes. The DAG store keeps these indices in the repository directory of Boost under the ./dagstore/index and ./dagstore/datastore directories. This data should be migrated to LID with the migrate-lid utility.
Recreate indices for deal data based on unsealed copies of sectors: this solution assumes that the Storage Provider has unsealed copies for every sector they prove on-chain. If this is not the case, then the SP should first trigger an unseal (UNS) job on their system for every sector that contains user data and produce an unseal copy. SPs can use the boostd recover lid utility to produce an index for all deal data within an unsealed sector and store it in LID so that they enable retrievals for the data. Depending on SPs deployment and where unsealed copies are hosted (NFS, Ceph, external disks, etc.) and the performance of the hosting system, producing an index for a 32GiB sector can take anywhere from a few seconds up to a few minutes, as the unsealed copy needs to be processed by the utility.

Migrate existing indices from the DAG store into LID

TODO

Recreate indices for deal data based on unsealed copies of sectors

TODO

Local Index Directory

This page describes the Local Index Directory component in Boost, what it is used for, how it works and how to start using it

Local Index Directory is not yet released. This is a placeholder page for its documentation.

Background

The Local Index Directory (LID) manages and stores indices of deal data so that it can be retrieved by a content identifier (cid).

LID is designed to provide a more intuitive experience for the user, by surfacing problems and providing various repair tools.

Storing data on Filecoin

Retrieving data

Use cases

The retrieval use cases that the Local Index Directory supports are:

Graphsync retrieval

Request one root cid with a selector, receive many blocks

LID is able to: - look up which piece contains the root cid - look up which sector contains the piece - for each block, get the offset into the piece for the block

Bitswap retrieval

Request one block at a time

HTTP retrieval

Request a whole piece

LID is able to look up which sector contains the piece.

Request an individual block

LID is able to: - look up which piece contains the block - look up which sector contains the piece - get the offset into the piece for the block

Request a file by root cid

LID is able to: - look up which piece contains the block - look up which sector contains the piece - for each block, get the offset into the piece for the block