🖇️

FilBeam URL format

Let’s discuss the URL format for FilBeam retrievals.

FilBeam URL format for Piece retrievals

https://{0xWalletAddress}.filbeam.io/{PieceCID}

Example:
https://0x87ce[...]0138.calibration.filbeam.io/bafkzcibe[...]45ci

SP origin URL:
https://{sp}/piece/bafkzcibe[...]45ci

FilBeam URL format for IPFS retrievals

See 📌FilBeam for IPFS data on Filecoin

To support the retrieval of IPFS content stored in FWSS deals, e.g. via filecoin-pin, we need to encode three components in the URL:

  • Client wallet address
  • IPFS Root CID
  • UnixFS subpath (optional)

Static websites hosted on IPFS require the URL pathname to be reserved for UnixFS subpaths, i.e. the wallet address and root CID must be encoded in the hostname.

Additionally, Cloudflare does not support multiple-level wildcards in domain names (docs). We need to encode both the wallet address and the CID in a single domain name component.

Ideally, we want to include both the wallet address and the root CID in the domain names, using the default encoding (hex encoding for the wallet address and CIDv0 or CIDv1 string for the root CID). Unfortunately, it is not possible to store both the address and the CID in a DNS label:

A 20-byte Ethereum address (160 bits) plus a 34–40 byte CID (272–320 bits) requires 432–480 bits total. A DNS-safe, case-insensitive alphabet (letters/digits/hyphen) tops out near base36/base37 ≈ 5.2 bits/char, so you’d need 83–93 chars.

While it’s possible to use an XOF function like SHAKE256 to map the (address, cid) tuple into a 63-character base36-encoded string with cryptographic collision resistance (practically zero), we feel this is not worth the complexity.

Proposed URL format

We propose to use the on-chain metadata to identify the requested content, the client paying for retrievals and the storage provider:

  • DataSetID is a uint256 value. It uniquely identifies the client paying for retrievals and the Storage Provider storing the data.
  • PieceID is a uint256 value. The pair (DataSetID, PieceID) uniquely identifies the content and enables clients to obtain both the Piece CID and IPFS Root CID from the on-chain state.
https://1-{base32(data_set_id)}-{base32(piece_id)}.ipfs.filbeam.io/{subpath}

Example:
https://1-ka-bi.ipfs.filbeam.io/favicon.ico

Why this works:

The format above gives us 300 bits of information we can encode in the 63-char DNS Label. 256 bits are used for data_set_id, 44 bits remain for piece_id. 2^44 = 17,592,186,044,416 - that seems more pieces per data set than any user will ever need.

The prefix 1- gives us versioning and makes it easy to introduce different formats in the future.

We consider this as content addressing with indirection - everyone can look up the underlying hashes and verify that the content pointed to is the one you're addressing. It is simply introducing this indirection as a means to bypass DNS limitations. The lookup is reliable, as it is based on smart contract state.

The URL format supports the static website hosting use case, where a CNAME record points to 1-ka-bi.ipfs.filbeam.io and FilBeam manages the TLS cert for the user's domain.

To support IPFS clients like Helia and Kubo, FilBeam can provide a Delegated Routing endpoint advertising bafybei...ppvm as retrievable at https://1-ka-bi.ipfs.filbeam.io/ - we are compatible with IPFS content discovery.

Convenience redirect

Filecoin and IPFS users are familiar with the concept of a wallet address and a CID. Looking up the DataSet and Piece IDs from the on-chain metadata can create too much friction for them. To address this problem, we propose introducing a new endpoint that accepts the wallet & CID and returns a redirect response to the dataset+piece URL described above.

https://link.ipfs.filbeam.io/{0xWallet}/{IpfsRootCid}/{subpath}

Example:
https://link.ipfs.filbeam.io/0x87ce[...]0138/bafybei[...]ppvm/favicon.ico
-> 302; Location: https://1-ka-bi.ipfs.filbeam.io/favicon.ico

Alternative explanation

Because we cannot fit both Address & RootCID into the DNS hostname, and we need to keep the URL pathname component reserved to UnixFS subpaths, we need to use a different mechanism mapping URLs to (address, cid) .We propose using an opaque identifier in the hostname and providing a redirect service, allowing the clients to easily discover that opaque identifier. The typical user flow would look like this:

  1. The client has a wallet address 0x87ce...0138 and wants to retrieve content stored in the IPFS Root CID bafybei...ppvm - the file /favicon.ico.
  1. Client requests GET https://ipfs.filbeam.io/0x87ce...0138/bafybei...ppvm/favicon.ico?format=car
  1. FilBeam returns a 302 redirect to something like https://1-ka-bi.ipfs.filbeam.io/favicon.ico?format=car
  1. Client requests GET https://1-ka-bi.ipfs.filbeam.io/favicon.ico?format=car
  1. FilBeam returns the content for the CID bafybei...ppvm


Additional resources

Originally proposed format that does not work due to the 63-character limit of DNS labels.

https://{IpfsRootCid}-{0xWalletAddress}.ipfs.filbeam.io/

Example:
https://bafybei[...]ppvm-0x87ce[...]0138.ipfs.calibration.filbeam.io/index.html

SP origin URL:
https://{sp}/ipfs/bafybei[...]ppvm/index.html

Piece retrieval URL format v2

To have a consistent developer experience, we are considering switching Piece retrievals to the same URL format as we use for IPFS.

https://{PieceCID}-{0xWalletAddress}.filbeam.io/

Example:
https://bafkzcibe[...]45ci-0x87ce[...]0138.calibration.filbeam.io/

SP origin URL:
https://{sp}/piece/bafkzcibe[...]45ci

IT IS NOT POSSIBLE TO STUFF ADDRESS+CID INTO ONE HOSTNAME COMPONENT

Slack discussion: https://space-meridian.slack.com/archives/C099XQTBGLU/p1759235259831849

The maximum length for a single label (a part of the domain name separated by dots, like "example" in "example.com") is 63 characters, while the total length of the fully qualified domain name (FQDN), including the dots, is limited to 253 characters.

  • Ethereum address is 160bits (20 bytes)
  • IPFS CIDv1 is usually 34-40 bytes

If you need a single DNS label ≤63 chars that is case-insensitive and fully reversible, you can’t fit a 20-byte Ethereum address (160 bits) plus a 34–40 byte CID (272–320 bits). That’s 432–480 bits total. A DNS-safe, case-insensitive alphabet (letters/digits/hyphen) tops out near base36/base37 ≈ 5.2 bits/char, so you’d need 83–93 chars. One label ≤63 chars is mathematically impossible.

Have you considered using id address of the wallet in the url - will that mitigate the limitation you are facing?

So, our budget is 315 bits.A typical UnixFS CID:

  • dag-pb (0x70; 8 bits) + sha2-256 (0x12; 8 bits) + 256 bits of digest = 272 bits. We have 43 bits remaining for the wallet actor ID - max value 8,796,093,022,208

Other CIDs I touched recently:

  • dag-json (0x0129; 16 bits) + sha2-256 (0x12; 8 bits) + 256 bits of digest = 296 bits. We have 19 bits remaining for the wallet actor ID - max value 524,288
  • raw (0x55, 8 bits) + fr32-sha256-trunc254-padbintree (0x1011, 16 bits) + 280 bites of digest = 304 bits. We have 11 bits remaining for the wallet actor id - max value 2048

If we drop the multicodec and use the multihash only:

  • sha2-256 - we have 51 bits for the wallet ID - seems good enough 👌🏻
  • fr32-sha256-trunc254-padbintree - we have 19 bits for the wallet ID, that's ~524k - not enough?

I suppose this could work with the following assumptions:

  • We drop the codec and use only the multihash. We will assume that people won't store the same content with different codecs in the IPFSRootCid.
  • We assume people use sha2-256 in their IPFS CIDs, not fancier hashes like CommPv2.
  • We assume that no more than 2^51 (2.25 peta) wallets will be created in the foreseeable future.

I am a bit hesitant to take this route, though, because we are asking users to perform a non-trivial conversion from a 0x wallet address string and a CIDv1 string into a densely packed identifier specific to FilBeam. At that point, I think most people will follow the easier route and just ask FilBeam to map their (wallet, cid) to the FilBeam hostname from which to retrieve.

Suggestion from ChatGPT: make collisions astronomically unlikely

If you can accept cryptographic collision resistance (practically zero), you can stay fully stateless and deterministic.

Here’s a tight, stateless spec that uses RFC 4648 base32 (lowercase, no padding) and maximizes hash bits within a single DNS label (≤63 chars).

One-label, stateless, RFC4648 base32 spec (v1)

Output: exactly 63 base32 chars, lowercase a–z2–7, no = padding.

Security: ~312-bit collision resistance (birthday bound ≈ 2^156).

Deterministic inputs: (ethereumAddress, ipfsCID) only. No tables.


1) Input normalization

Ethereum address (20 bytes).

  • Accept 0x-prefixed or bare hex, any case.
  • Decode to exactly 20 raw bytes (addr_bytes).

IPFS CID (variable).

  • Accept CIDv0 or CIDv1 in any multibase.
  • Canonicalize to CIDv1 bytes (cid_bytes):
    • If CIDv0, convert to CIDv1 (dag-pb + sha2-256) per IPFS rules.
    • If CIDv1 string, multibase-decode to the binary CID (varint multicodec | multihash).
    • Do not keep the multibase prefix; you want the raw bytes.

Domain separation (future-proofing).

  • Prepend a short tag and a version byte so you can change rules later without breaking determinism:
    • tag = ASCII("eth+cid") (7 bytes)
    • ver = 0x01 (1 byte)

Preimage bytes:

preimage = tag | ver | addr_bytes (20) | cid_bytes (len ≥ 32)

Length is unambiguous (fixed 21-byte header + 20-byte addr + the rest is CID).


2) Hash / XOF

Generate exactly 39 bytes (312 bits) so the base32 output lands on 63 chars without padding:

  • Use a wide, standard XOF:
    • digest = SHAKE256(preimage, outlen=39)

      (Alternatively: BLAKE3_XOF(preimage, outlen=39).)

Rationale: with RFC4648 base32, output length = ceil(8·n/5).

For n=39, ceil(8·39/5) = 63. That gives you 312 bits of entropy in the 63 characters (the base32 process internally introduces 3 zero pad bits for the last quintet—this is standard).


3) Encode (RFC 4648 base32, lowercase)

  • Encode digest with RFC4648 base32no padding.
  • Convert to lowercase (DNS is case-insensitive; use lowercase for consistency).
  • Alphabet: a..z2..7 (lowercase version of standard RFC alphabet).

Result: a 63-character label, e.g.

pc6m... (total 63 chars, a–z2–7 only)

4) Collision profile (what you get)

  • Digest bits used: 312.
  • Single-shot collision ≈ 1 / 2^312.
  • Birthday bound for a set of size M ≈ ~ M^2 / 2^313.
    • For M = 10^9, probability ≈ 2^(-313 + 60) ≈ 2^-253 (negligible).
    • For M = 10^12, ≈ 2^-193 (still negligible).

No checksum: you’ve maximized hash bits. If you later want typo detection, reserve a few trailing chars for parity and drop digest bits accordingly.


5) Decoding / reverse lookup

  • To resolve a label back to the tuple, keep a DB mapping from the 63-char code → tuple.
    • On insert: compute code from (addr, cid), store code → (addr, cid).
    • On lookup: exact-match the code.
    • If you ever hit a real collision, you’ll detect it at insert (second tuple yields same code); you can reject or handle with an out-of-band suffix/version bump. Probability is vanishingly small.

6) Interop notes

  • DNS-safe: only letters/digits, no hyphens or padding; length = 63.
  • Case-insensitive: you may accept uppercase on input; always emit lowercase.
  • Versioning: if you ever change normalization rules, bump ver (and optionally encode a tiny version nibble as a prefix character if you want visible versioning).

Minimal pseudocode

TypeScript (concise, using SHAKE256):

import { shake256 } from "@noble/hashes/sha3"; // SHAKE256 XOF
import { base32 } from "@scure/base";          // RFC4648 base32

function toBytesEth(addrHex: string): Uint8Array {
  const h = addrHex.toLowerCase().replace(/^0x/, "");
  if (h.length !== 40) throw new Error("bad eth addr length");
  return Uint8Array.from(h.match(/../g)!.map(b => parseInt(b, 16)));
}

function cidToBytes(cidStr: string): Uint8Array {
  // use a CID library (multiformats) to parse and toV1(); then extract bytes
  // const c = CID.parse(cidStr).toV1();
  // return c.bytes;
  throw new Error("implement with multiformats/cid");
}

export function code63(addrHex: string, cidStr: string): string {
  const tag = new TextEncoder().encode("eth+cid");
  const ver = new Uint8Array([0x01]);
  const addr = toBytesEth(addrHex);
  const cid  = cidToBytes(cidStr);
  const preimage = new Uint8Array(tag.length + 1 + addr.length + cid.length);
  preimage.set(tag, 0);
  preimage.set(ver, tag.length);
  preimage.set(addr, tag.length + 1);
  preimage.set(cid, tag.length + 1 + addr.length);

  const digest = shake256.create({ dkLen: 39 }).update(preimage).digest(); // 39 bytes
  const b32 = base32.encode(digest).replace(/=+$/,"").toLowerCase(); // RFC4648, no padding
  if (b32.length !== 63) throw new Error("unexpected length");
  return b32;
}

Rust (outline):

// crates: sha3 (SHAKE256), data-encoding (BASE32)
use sha3::{Shake256, digest::{Update, ExtendableOutput, XofReader}};
use data_encoding::BASE32_NOPAD;

fn code63(addr: [u8;20], cid_bytes: &[u8]) -> String {
    let mut hasher = Shake256::default();
    hasher.update(b"eth+cid");
    hasher.update(&[0x01]);
    hasher.update(&addr);
    hasher.update(cid_bytes);
    let mut out = [0u8; 39];
    hasher.finalize_xof_into(&mut out);
    BASE32_NOPAD.encode(&out).to_lowercase()
}

Why 39 bytes?

Because RFC4648 base32 encodes 5 bytes → 8 chars.

39 bytes → ceil(39/5)*8 = 8*8 = 64? (Careful!) The exact formula is ceil(8·n/5).

For n=39: ceil(8*39/5) = ceil(62.4) = 63. That gives you 63 chars with no padding and 312 bits of digest in them. Perfect fit for one label.