FilBeam, IPFS and IPNI

Author: @Miroslav Bajtoš <miroslav@meridian.space>

Last Updated: @October 24, 2025

⚠️

Before reading this document, please familiarise yourself with the problem space by reading 📌FilBeam for IPFS data on Filecoin

The goal of this document is to identify what can be shipped in time for LabWeek in November 2025.

Status as of 2025-10-24

We are already indexing IPFS-related Data Set and Piece metadata for both calibration and mainnet deals. ✅

We have a prototype IPFS retrieval worker that accepts (walletAddress, ipfsRootCid, ipfsSubpath) and returns either a CAR file or rendered content for the requested resource:
https://github.com/filbeam/worker/pull/312

The data is fetched from an FOC SP that’s storing a Piece with metadata indicating the requested ipfsRootCid. The Piece must be part of a Data Set having metadata withCDN and withIPFSIndexing.

The worker is deployed to Cloudflare for the calibration testnet and available for testing & demos.

Caveats

Curio had a bug that caused SPs to return 404 for all IPFS subpath requests. The bug was fixed this week, but the fix was not yet deployed to SPs participating in FOC.

The IPFS retrieval worker implementation is not feature-complete and not production-ready. More work is required before we can land this new worker in our main branch and deploy it. Among other things, the worker is not charging for egress consumption.

Only retrievals using the root CID are supported. Kubo and Helia work differently (see ); they request individual blocks inside the Merkle tree. As a result, the prototype does not support Kubo and Helia.

Open questions

Which retrieval clients do we want to support? (Kubo/Helia, web2 browser users, etc.)

How will these clients discover FilBeam retrievals? (IPNI at cid.contact, a custom configuration with FilBeam Delegated Routing endpoint, something else?)

Who are the potential customers?

Who actually wants/needs this?

Notes (Miroslav’s knowledge that may be outdated):

PinMe runs its own fleet of Kubo nodes to provide a retrieval layer converting IPFS blocks in the web2-style websites. They were not interested in replacing that layer with FilBeam when we discussed this option back in August/September.

Akave splits files into chunks using erasure encoding, encrypts each chunk, and store chunks on different SPs. They have their own nodes performing this transformation on data upload path and reassembling the data back on the retrieval path. I asked Angelo about FilBeam in Berlin Colo in September, he wasn’t interested - he didn’t see what value FilBeam would bring.

Storacha - AFAIK, they want to use FOC FWSS as a backup solution. In our experience with using Storacha to store measurements produced by Spark, it performs rootCid-based retrievals, which are already covered by our prototype. More research is needed to understand better how FilBeam can be helpful to Storacha (if at all).

Block-by-block retrievals

This assumes we want to support Kubo and Helia nodes.

Before discussing the implementation details, we need to answer the following question: how do we expect these nodes to discover FilBeam retrievals?

Routing via IPNI at cid.contact

Many of the existing nodes are configured to use the delegated routing endpoint provided by cid.contact to discover retrieval providers for IFPS data. If we want to seamlessly upgrade these nodes to start using FilBeam to retrieve data stored in FOC FWSS, we need FilBeam to advertise our retrieval provider to IPNI for every IPFS block stored in every FOC FWSS Piece.

At the moment, there is no easy way for 3rd parties like FilBeam to obtain the list of all IPFS blocks stored in a given Piece.

As a result, we don’t have a way to allow FilBeam to advertise retrievals to IPNI for every block that Kubo/Helia may request.

Fortunately, some IPFS nodes (Helia?) don’t query the index for every block they encounter while traversing the Merkle tree. Instead, they use an optimistic strategy, assuming that if a node served a parent block, it will most likely serve the child blocks too.

This is a hypothesis. We will need to verify how well it works in practice.

In that world, we may be able to cover the most common retrieval scenarios by advertising FilBeam retrievals only for the root CIDs. (Root CID is part of the on-chain metadata for each Piece.)

Caveats

IPNI responses will contain entries for both FilBeam retrievals and direct retrievals from SPs. IPFS nodes don’t have a way to discriminate between these options. They are as likely to pick FilBeam as direct retrievals. As a result, only some retrievals will be accelerated by FilBeam and will incur egress charges.

It takes time for IPNI to ingest new announcements. There were also many outages this year. We will need to implement monitoring to detect when IPNI stops processing our new advertisements in a timely manner, and then work with the IPNI team to diagnose the problem. Based on what I have observed over the last 2-3 years, diagnosing why IPNI did not ingest our ads will be tedious.

Routing via custom Delegated Routing endpoint config

It is possible to configure the IPFS node to use FilBeam as one of its content-discovery providers. In this setup, the node will send all queries to FilBeam. (← That’s Miroslav’s assumption based on his understanding; we need to verify whether it holds in practice.)

It may be possible to configure IPFS nodes to prioritise retrieval providers discovered via FilBeam delegated routing endpoint over other retrieval options - further research is required here.

In this scenario, FilBeam does not need to know all IPFS blocks in advance; we can learn about them as retrieval clients request them.

Caveats

To make this work, we need IPFS nodes to change their configuration. That can become an obstacle preventing wider adoption of FilBeam retrievals.

Block discovery via IPNI

Let’s assume we solved the problem of routing & discovery by IPFS nodes, and IPFS nodes start sending single-block requests to FilBeam. How can we map a block CID to an FOC FWSS deal?

Ideally, the index of blocks will be attached to each Piece, so that FilBeam can obtain it as part of the indexing process for each new Piece. (A potential solution is described here: Implementation details.) Unfortunately, that’s not going to happen in time for the GA launch in November.

Here is a way we could use IPNI to discover FOC FWSS deals for a given IPFS block CID:

In the IPFS retriever worker, we receive an IPFS block cid.

Query https://cid.contact/cid/{cid} to get the list of all retrieval providers

For each retrieval provider, inspect the ContextID associated with the record. If the ContextID is in the format base64(0x01 || pieceCid) and the retrieval protocol is HTTP, then this is very likely an advertisement from Curio for a FOC FWSS deal.
Use this process to collect the list of pieceCids.

Query FilBeam’s index to find all FOC FWSS deals (data set IDs) where the dataset has withCDN and withIPFSIndexing metadata and the Piece has pieceCid.

Pick any deal from that list and charge it for egress.

Now we have two ways to construct the SP URL to retrieve the IPFS block from:
1. We can use the address found in the IPNI response.
1. We can look up the FOC Provider record in our index and construct the retrieval URL from the provider address.

Caveats

There is a bug in the current Curio version used by FOC SPs where the ContextID contains PieceCID in the CommPv1 format. All on-chain events and our index contains CommPv2 format, which includes Piece Size. As a result, Piece CIDs extracted from ContextIDs won’t match any PieceCIDs stored in our index.
See https://github.com/filecoin-project/curio/issues/735

It takes up to 10 minutes for IPNI to ingest new CIDs. In other words, when the user uploads a file to FOC, it may take minutes for the file to be retrievable via FilBeam.

There is no verifiable link between the IPFS block CID and Piece CID. We are creating an attack vector in which an SP can advertise retrievals for FWSS content they store, with ContextID values pointing to a different data set stored by a different SP. FilBeam will charge egress to a wrong FWSS deal. But, as a result, the SP won’t receive retrieval requests and won’t earn the reward for cache-miss egress, so this attack vector does not make economic sense as long as cache-miss egress rewards are more than the real costs paid by the SP.

This makes IPNI a core component of FilBeam. Any outage of IPNI lookups will cause an outage of FilBeam retrievals. As we have experienced with Spark, such outages are not uncommon, and it can take hours to days until the IPNI team fixes them.

IPNI lookups will slow down FilBeam retrievals, especially for infrequently accessed data. We can mitigate this problem somewhat by caching IPNI lookups on our side, but that helps only for subsequent requests for the same CID, and we need to figure out a good cache invalidation strategy. On the flip side, cid.contact is running behind a Cloudflare proxy, so subsequent IPNI requests may already be fast enough.