# How to upgrade from Boost v1 to Boost v2

{% hint style="warning" %}
Make sure you have read the [**Components**](https://boost.filecoin.io/deployment) page before proceeding. Boost v2 introduces a new service called **boostd-data** which requires a database to be installed - YugabyteDB or LevelDB.
{% endhint %}

## Introduction

Boost v2 introduces the Local Index Directory as a replacement for the DAG store. It scales horizontally and provides a more intuitive experience for users, by surfacing problems in the UI and providing repair functionality.

<figure><img src="https://437316145-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfZMqMjrWeFXMBES7ehiI%2Fuploads%2FAwagrPbzXPS9sskaJwwi%2FScreenshot%202023-06-14%20at%2013.13.54.png?alt=media&#x26;token=225d016b-340a-48c7-a5de-4e7b70aed0e2" alt=""><figcaption></figcaption></figure>

### Architecture

When boost receives a storage deal, it creates an index of all the block locations in the deal data, and stores the index in LID.

When boostd / booster-http etc gets a request for a block it:

* gets the block sector and offset from the LID index
* requests the data at that sector and offset from the miner

![LID Flow](https://437316145-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfZMqMjrWeFXMBES7ehiI%2Fuploads%2Fgit-blob-44a62fd2c7451b2cd8c34053132e798d0851d4a9%2FLID-flow.png?alt=media)

A large miner with many incoming retrieval requests needs many boostd / booster-http / booster-bitswap processes to serve those requests. These processes need to look up block locations in a centralized index.

We tested several databases and found that YugabyteDB is best suited to the indexing workload because

* it performs well on off-the-shelf hardware
* it's easy to scale up by adding more machines
* it has great documentation
* once set up, it can be managed through a web UI

### Connecting multiple boost instances to a single LID

It is possible to connect multiple boostd instances to a single LID instance. In this scenario, each boostd instance still stores data to a single miner. eg boostd A stores data to miner A, boostd B stores data to miner B. However each boostd instance saves retrieval indexes in a single, shared LID instance.

![LID Flow](https://437316145-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfZMqMjrWeFXMBES7ehiI%2Fuploads%2Fgit-blob-ae37dd9a92cbe1afbfd5219eee59465b0c30f8f6%2FLID-multi-store.png?alt=media)

For retrieval, each boostd instance can query the shared LID instance (to find out which miner has the data) and retrieve data from any miner in the cluster.

![LID Flow](https://437316145-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfZMqMjrWeFXMBES7ehiI%2Fuploads%2Fgit-blob-835e904abe1ac1e3c96c19d29ce530becb69a59a%2FLID-multi-fetch.png?alt=media)

booster-bitswap and booster-http can also be configured to query the shared LID instance, and retrieve data from any miner in the cluster.

![LID Flow](https://437316145-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfZMqMjrWeFXMBES7ehiI%2Fuploads%2Fgit-blob-5c6ae86bc540cbc8b19d7dd9e9fb8087096d60e4%2FLID-multi.png?alt=media)

{% hint style="danger" %}
If you are deploying multiple `boostd` instances with a single LID instance you will need to set up the networking so that each `boostd`, `booster-bitswap` and `booster-http` instance can query all miners and workers in the cluster. We recommend assigning all of your miner instances and boostd instances to the same subnet. Note also that the Yugabyte DB instance will need enough space for retrieval indexes for all of the miners.
{% endhint %}

## Prerequisites

### Install YugabyteDB

The Local Index Directory stores retrieval indices in a YugabyteDB database. Retrieval indices store the size and location of each block in the deal data.

We recommend running YugabyteDB on a dedicated machine with SSD drives. Depending on how many blocks there are in the user data, the retrieval indices may require up to 2% of the size of the unsealed data. e.g. 1 TiB of unsealed user data may require a 20 GiB index.

YugabyteDB should require about the same amount of space as your DAG store requires today.

You can find more information about YugabyteDB in the `Components` section:

{% content-ref url="../deployment/yugabytedb" %}
[yugabytedb](https://boost.filecoin.io/deployment/yugabytedb)
{% endcontent-ref %}

## Instructions

Follow these instructions in order to migrate your existing DAG store into the new Local Index Directory and upgrade from Boost v1 to Boost v2:

**1. Clone the Boost repository to a temporary directory**

Note: Don’t overwrite your existing boost instance at this stage

```
cd /tmp
git clone https://github.com/filecoin-project/boost.git boostv2
cd boostv2
```

**2. Check out the Boost v2 release**

```
git checkout v2.x.x
```

**3. Build from source**

```
make
```

**4. Migrate dagstore indices**

Depending on the amount of data your SP is storing, this step could take anywhere from a few minutes to a few hours. You can run it even while Boost v1 continues to run. The command can be stopped and restarted. It will continue from where it left off.

Run the migration with parameters to connect to YugabyteDB on its Cassandra and PostgreSQL interfaces:

```
./migrate-lid yugabyte \
  --hosts <yugabytedb-hosts> \
  --connect-string="postgresql://<username>:<password>@<yugabytedb>:5433" \
  dagstore
```

{% hint style="info" %}
The PGX driver from Yugabyte supports cluster aware Postgres connection out of the box. If you are deploying a multi-node YugabyteDB cluster, then please update your connect-string to use a cluster aware connection.

With Cluster Mode: "postgresql://postgres:postgres\@127.0.0.1:5433?load\_balance=true"<br>

With Cluster Mode + No SSL: "postgresql://postgres:postgres\@127.0.0.1:5433?sslmode=disable\&load\_balance=true"
{% endhint %}

It will output a progress bar, and also a log file with detailed migration information at `migrate-yugabyte.log`

{% hint style="info" %}
If you are deploying a single LID instance with multiple boost instances, you will need to repeat this step for each boost instance in the cluster.
{% endhint %}

**5. Run the `boostd-data` service**

`boostd-data` is a data proxy service which abstracts the access to LID through an established interface. It makes it easier to secure the underlying database and not expose it. `boostd-data` listens to a websocket interface, which is the entrypoint which should be exposed to `boostd`, and`booster-http`

Start the `boostd-data` service with parameters to connect to YugabyteDB on its Cassandra and PostgreSQL interfaces:

```
./boostd-data run yugabyte \
  --hosts <yugabytedb-hosts> \
  --connect-string="postgresql://<username>:<password>@<yugabytedb>:5433" \
  --addr 0.0.0.0:8044
```

{% hint style="info" %}
The PGX driver from Yugabyte supports cluster aware Postgres connection out of the box. If you are deploying a multi-node YugabyteDB cluster, then please update your connect-string to use a cluster aware connection.

With Cluster Mode: "postgresql://postgres:postgres\@127.0.0.1:5433?load\_balance=true"<br>

With Cluster Mode + No SSL: "postgresql://postgres:postgres\@127.0.0.1:5433?sslmode=disable\&load\_balance=true"
{% endhint %}

{% hint style="info" %}
\--hosts takes the IP addresses of the YugabyteDB YT-Servers separated by ","\
Example:

\-- hosts 10.0.0.1,10.0.0.2,10.0.0.3

\
\--addr is the \<IP>:\<PORT> where `boostd-data` service should be listening on. The IP here can be a private one (recommended) and should reachable by all boost related processes. Please ensure to update your firewall configuration accordingly.
{% endhint %}

{% hint style="warning" %}
If you are deploying a single LID instance with multiple `boostd` instances, you should run a single `boostd-data` process on one of the hosts where YugabyteDB is installed. All `boostd`, `booster-bitswap` and `booster-http` instances should be able to reach this single `boostd-data` process.
{% endhint %}

**6. Update `boostd` repository config**

Configure `boostd` repository config (located at `<boostd repo>/config.toml`) to point to the exposed `boostd-data` service endpoint. Note that the connection must be configured to go over a websocket.

For example:

```
[LocalIndexDirectory]
  ServiceApiInfo = "ws://<boostd-data>:8044"
```

**6.1 Add miners to `boostd` repository config**

If you are deploying a single LID instance with multiple boost instances, you will also need to add to config the RPC endpoint for each miner in the cluster. This allows boostd to serve data for each miner over Graphsync.

```
[DealMaking]
  GraphsyncStorageAccessApiInfo = [
    # Make sure to include the miner that this boostd instance
    # stores data to, as well as the other miners.
    # Use `lotus-miner auth api-info` to get the RPC API connect string.
    "<auth token>:/ip4/<ip>/tcp/2345/http",
    "<auth token>:/ip4/<ip>/tcp/2345/http"
  ]
```

Make sure to test that this boostd instance can reach each miner by running

```
$ MINER_API_INFO=<auth token>:/ip4/<ip>/tcp/2345/http lotus-miner info
```

**7. Install Boost v2**

```
make install
```

Note that in v2 `booster-http` and `booster-bitswap` take slightly different parameters (see below).

**8. Stop `boostd`, `booster-http` and `booster-bitswap`**

You need to stop `boostd` before migrating `piece info` data.

**9. Migrate piece info data (information about which sector each deal is stored in)**

This should take no more than a few minutes.

```
./migrate-lid yugabyte \
  --hosts <yugabytedb-hosts> \
  --connect-string="postgresql://<username>:<password>@<yugabytedb>:5433" \
  pieceinfo
```

{% hint style="info" %}
If you are deploying a single LID instance with multiple `boostd` instances, you will need to repeat this step for each `boostd` instance in the cluster.
{% endhint %}

**10. Start the upgraded versions of `boostd`, `booster-http` and `booster-bitswap`**

Note that `booster-http` and `booster-bitswap` take slightly different parameters:

* `--api-boost` is removed
* There is a new parameter `--api-lid` that points to the `boostd-data` service (which hosts LID), e.g. `--api-lid="ws://<boostd-data>:8044"`
* If you are deploying a single LID instance with multiple `booster-bitswap` and `booster-http` instances, you should supply a `--api-storage` flag for each one
  * eg `--api-storage=MINER_API_INFO_1 --api-storage=MINER_API_INFO_2`
  * Make sure to test that this booster-http / booster-bitswap instance can reach each miner by running

    ```
    $ MINER_API_INFO=MINER_API_INFO_1 lotus-miner info
    ```

**11. Clean up the dagstore directory from `boostd` repo and the temporary boost github repo**

{% hint style="danger" %}
Be careful when running the below command to ensure that you do not remove incorrect directory\
\
$ rm -rf \<boostd repo>/dagstore

$ rm -rf /tmp/boostv2
{% endhint %}

## Verify the setup

**1. Test how long it takes to reindex a piece**

```
time boostd lid gen-index <piece CID>
```

**2. Perform a retrieval using Graphsync, Bitswap and HTTP**

{% code overflow="wrap" %}

```
# boost retrieve --provider=<miner id> -o output.dat <cid>

# booster-bitswap fetch /ip4/127.0.0.1/tcp/8888/p2p/{peerID} {rootCID} outfile.car

# curl -H "Accept:application/vnd.ipld.car;" http://{SP's http retrieval URL}/ipfs/bafySomePayloadCID -o bafySomePayloadCID.car
```

{% endcode %}

## Conclusion

At this stage you should have the latest version of Boost running with the Local Index Directory. Go to the **Local Index Directory** page and review the number sections:

#### Pieces

<figure><img src="https://437316145-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfZMqMjrWeFXMBES7ehiI%2Fuploads%2F3NgBMaEMAkyGYbLRLQpd%2FScreenshot%202023-06-19%20at%2013.43.34.png?alt=media&#x26;token=1985d12c-8b93-4164-9acf-e80b33148fa9" alt=""><figcaption></figcaption></figure>

Pieces section shows counters for total pieces of user data that your SP is storing as well as whether you are keeping *unsealed* and *indexed* copies of them.

#### Flagged pieces

<figure><img src="https://437316145-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfZMqMjrWeFXMBES7ehiI%2Fuploads%2F3o0hRSr6XEXmnrkpZ5rm%2FScreenshot%202023-06-19%20at%2016.31.18.png?alt=media&#x26;token=6bd101e1-dd11-480a-8f22-08d185231e48" alt=""><figcaption></figcaption></figure>

*Flagged pieces* are pieces that either lack an unsealed copy or are missing an index. For the sealed-only user data, you should make sure that you unseal individual sectors if you want this data to be retrievable.

*Sealed only* copies of data are not retrievable and are only being proven on-chain within the corresponding *deadline* / *window*. Typically sealed only data is considered as archival as it is not immediately retrievable. If the client requests it, the SP sealing pipeline must first unseal it, which typically takes 1-2 hours, and only then the data becomes available.

*Flagged (unsealed)* pieces is user data that your SP is hosting, which is not indexed.

{% hint style="info" %}
We recommend that you trigger re-indexing for these pieces, so that data becomes retrievable. Check the [tutorial on re-indexing flagged unsealed pieces](https://boost.filecoin.io/tutorials/how-to-re-index-unsealed-pieces-that-are-flagged-by-lid-in-boost-v2) for more information.
{% endhint %}

#### Deal Sector Copies

<figure><img src="https://437316145-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfZMqMjrWeFXMBES7ehiI%2Fuploads%2FwHBAINXXZm5VFWvtL3Ge%2FScreenshot%202023-06-19%20at%2016.34.41.png?alt=media&#x26;token=c03d3e55-1461-477d-87f4-cf5ac3a23f60" alt=""><figcaption></figcaption></figure>

Deal Sector Copies section displays counters of your sectors state - whether you keep unsealed copies for all sectors or not. Ideally the SP should keep unsealed copies for all data that should be immediately retrievable.

#### Sector Proving State

<figure><img src="https://437316145-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FfZMqMjrWeFXMBES7ehiI%2Fuploads%2FjiB1nOKeKaaLyBt2usHK%2FScreenshot%202023-06-19%20at%2016.34.46.png?alt=media&#x26;token=495b803e-8ff5-436e-b6f1-b670aa4b95b9" alt=""><figcaption></figcaption></figure>

Sector Proving State section displays counters of your active and inactive sectors - active sectors are those that are actively proven on-chain, inactive sectors are those that you might have failed to publish a *WindowPoSt* for, or are expired or removed.
