🔏

Payer-authorized retrievals via FilCDN

Introduction

In the current design, FilCDN retrievals use an URL in the following format:

https://{WalletAddress}.filcdn.io/{PieceCID}

The first component, WalletAddress, determines who will be charged for the costs of serving this retrieval request.

The second component, PieceCID, identifies the content to retrieve using content-addressing.

In the M1 & M2 design, FilCDN provides retrievals for pieces stored by the client WalletAddress in WarmStorage deals. In other words, the CDN retrieval service is bundled with the storage service. 3rd-party clients can’t pay for CDN-accelerated retrievals of content stored by somebody else.

Additionally, retrieval requests do not require authorisation - anyone with knowledge of a valid pair (WalletAddress, PieceCID) can retrieve the content in such a way that the client paying for the storage & CDN is charged for egress. This was not an issue yet, because FilCDN currently provides unlimited egress.

For M3 and beyond, we want to implement two significant changes:

  1. Change the pricing model so that clients pay for egress, e.g., 10 USDFC per 1 TB.
  1. Allow anyone to set up paid CDN retrievals for content stored in any PDP deal.

This new design opens multiple attack vectors:

  1. Because the metadata of all WarmStorage deals is publicly available in the blockchain state, it’s easy for an attacker to obtain a valid pair (WalletAddress, PieceCID) and send retrieval requests that cause egress costs to be charged to the victim’s wallet.
  1. If FilCDN allows retrievals of content in any PDP deal, the attacker can combine the wallet address of a victim with the Piece CID of content they are interested in, and download such content “for free”, with the egress costs charged to the victim’s wallet.

Example scenarios

First scenario to illustrate the problem:

  • A newspaper is using Filecoin Cloud Services and FilCDN to serve its website to readers.
  • In an article about popular music, the newspaper wants to embed a music video stored by an artist on Filecoin Cloud Services.
  • Because the artist is not paying for CDN, and the newspaper wants their media content to be immediately available even with embedded videos, they want to pay FilCDN to provide fast retrievals for the video they are embedding.
  • However, the newspaper does not want to pay for fast access to every video stored on Filecoin Cloud Services, only for the video that’s embedded in their article.
  • We need an authorisation mechanism that lets the wallet owner decide which PieceCIDs are allowed to be retrieved. In other words, the newspaper wants to fund retrievability for specific pieces and none other.
  • This assumes public access, i.e., anyone can download the content via FilCDN as long as the wallet paying for the FilCDN allows downloads of that content.

Second scenario:

  • The newspaper requires a subscription to read the articles.
    (Perhaps the fast access via FilCDN is available to subscribers only, and the general public can read the news using the slower and less reliable direct retrieval from Storage Providers.)
  • FilCDN must require authorization from the retrieval client before it charges the cost of retrieval egress to the wallet paying for the FilCDN deal.

I am arguing that we can keep ignoring the second scenario for now. The rest of this document focuses on addressing the first scenario.

Potential solutions

Virtual PDP DataSets

Implement virtual PDP DataSets, allowing the CDN client to define an allow-list of Piece CIDs that are authorised for retrievals.

Pros:

  • The allow-list is managed on the chain.
  • We can reuse the existing authorisation mechanism implemented in FilCDN.
  • It’s easy to revoke authorisation - just remove a piece from the virtual dataset.
  • The retrieval client does not need to provide any credentials. E.g., you can put the FilCDN URL into an <img> tag in your static website.

Cons:

  • The CDN payer has to sign a transaction whenever they want to add or remove Piece CIDs authorised for CDN retrievals.
  • It’s not clear how to implement such virtual PDP DataSets. Would we need to modify the singleton PDP smart contract?
  • The design could be confusing for developers not familiar with PDP & FilCDN internals.

On-chain allow-list

Implement a new allow-list mechanism, where the FilCDN client has to explicitly enlist all PieceCIDs that are authorised for retrievals. This allow-list can be maintained in an on-chain state of a new smart contract via smart contract API, or in an off-chain FilCDN database via REST API.

Pros:

  • The design is easy to understand.
  • It’s easy to revoke authorisation - just remove an entry from the on-chain allow-list.
  • The retrieval client does not need to provide any credentials. E.g., you can put the FilCDN URL into an <img> tag in your static website.

Cons:

  • Potentially worse UX for existing users, as they have to interact with two smart contracts now, and sign two transactions - one to upload a new piece to PDP, another to authorise CDN retrievals of that piece.
  • The CDN payer has to sign a transaction whenever they want to add or remove Piece CIDs authorised for CDN retrievals.

Hybrid Alternative

Implement a combination of the on-chain allow-list described above, with a fallback to PDP DataSets. I.e. a PieceCID is authorised for CDN retrieval if any of the following two conditions are met:

  1. The wallet has an active PDP storage deal for that Piece CID.
  1. The Piece CID is in the on-chain allow-list.

Pros:

  • All of the above.
  • The UX for existing users remains unchanged. If you store data on PDP and pay for CDN, CDN retrievals of your Pieces remain authorised.

Cons:

  • The CDN payer has to sign a transaction whenever they want to add or remove Piece CIDs authorised for CDN retrievals.

Access-token-based authorization

Require the retrieval client to obtain an access token signed by the wallet paying for CDN retrievals and supply the token in the retrieval request.

The access token can contain structured data, e.g. a list of Piece CID authorized for retrievals.

Related work:

  • UCAN, used by Storacha
  • JWT - JSON Web Token, used in Web2. Unfortunately, it’s incompatible with Ethereum signatures.

Pros:

  • The design is easy to understand.
  • No blockchain interaction is needed to modify authorization rules.
  • The offline design makes it easy to cache the authorization result.

Cons:

  • Retrievals involve more complexity on the client side, as the client needs to obtain the access token and include it in the retrieval request.
  • Authorization is difficult to revoke. UCAN provides a spec for revoking tokens, but it’s not trivial. Sign-In with Ethereum does not support revocation out of the box, although it’s possible to implement an on-chain registry of revoked tokens.

Elephant in the room: How to route cache-miss requests

If we allow anyone to set up paid CDN retrievals for content stored in any PDP deal, then how do we decide which SP to retrieve the requested piece from?

  • If the CDN payer is not paying for storage:
    • Is it legitimate to expect an SP storing the piece to serve our retrieval request?
    • What if they don’t have enough capacity to do so?
    • What if the storage deal did not include “withCDN” flag - is it okay for somebody else to ask FilCDN to retrieve from the SP on their behalf, i.e. on behalf of somebody who did not pay for the storage deal?

      I suppose this should be okay as long as FilCDN is compensating SP for this retrieval request. However, to be able to do so, there must be a payment rail set up between the wallet paying FilCDN and the SP storing the content.

      • Setting up such payment rails complicate the UX. The user paying for CDN has to setup multiple payment rails, one rail for each SP they are willing to pay for serving cache-miss requests. Each of those payment rails will lock up some amount of funds. Such design is suboptimal IMO. As a user, I don’t want to lock 1 FIL for SP1 and 1 FIL for SP2, I want to lock 2 FIL for all cache-miss retrievals.
      • We would get a better UX if there was a single payment rail from the CDN customer to multiple SPs.
  • If the CDN payer is also paying for storage:
    • The “natural” approach is to request the piece from the SP paid by the CDN payer.
    • However, what if that SP has degraded service at the time of the request, and there are better-performing SPs available to serve the requested piece? Can FilCDN send the cache-miss request to them? How will such SP be compensated?