Meridian Design Doc 2: Orchestrator Model in Depth

Introduction

MERidian stands for Measure, Evaluate, Reward, the three steps of the impact evaluator framework. The Meridian project aims to create an impact evaluator for “off-chain” networks. I.e. a network of nodes that do not maintain a shared ledger or blockchain of transactions.

This doc proposes a framework and model for Meridian that will cater initially for both the Saturn payouts system and the SPARK Station module. For a bonus point, it should be able to cater for any Station module. We believe that trying to generalise beyond these few use cases at this point may be counterproductive.

We will structure this design doc based on the three steps of an Impact Evaluator (IE), measure, evaluate and reward.

Fraud Detection

Fraud detection happens in multiple stages of the system, therefore we cover it first before going further into the system.

Fraud detection happens during measure, additional fraud filtering happens pre evaluate.

flowchart TD
  subgraph Measure
	  subgraph Logs
		  L1[Log]
		  L2[Log]
		  L3[Log]
	  end
	  subgraph Buckets
	    BH[Honest logs]
	    BF[Fraudulent logs]
	  end
	  Logs--"detect fraud + aggregate"-->Buckets
	  L1[Log] --> BH
	  L2[Log] --> BH
	  L3[Log] --> BF
  end
  BH--"detect fraud + evaluate"-->Evaluation

Privacy

This section applies to fraud detection algorithms that can be cheated by tweaking its inputs, where therefore fraud detection needs to be as opaque a process as possible. This applies to Saturn, and doesn’t apply to Spark.

With a public function, gaming the system becomes easy, therefore the algorithm needs to be concealed. More research needs to be done on how to safely publish fraud detection algorithms.

The results of fraud detection will also need to stay private too, because otherwise machine learning can be used to infer the fraud detection function. In order to give participants some insight into what fraud is, the percentage of fraudulent records will be exposed as part of the evaluate results.

Measure vs Evaluate

Intuitively, fraud detection is part of evaluate. The lesson from Saturn however is that fraud detection is part of measure, not evaluate. Fraud detection needs to look at every single request log, which is too large a chunk of work for the once-per-payment-epoch-evaluate step. Therefore, fraud is detected often during measure, where the Orchestrator periodically measures and aggregates logs (e.g. every 20 minutes).

One significant downside of detecting fraud during measure is that results are committed on chain and can’t be changed later on. This means that if there is a bug in fraud detection, it is possible for the data for a whole payment epoch to be messed up.

Whether fraud detection could be moved to evaluate is unclear at this point.

Perform aggregation only once → there also run fraud detection

Fraud detection during measure

While aggregating measurements, fraud detection is run on every single log received by the Orchestrator. Based on the results, logs are aggregated into one of two buckets:

Honest logs

Fraudulent logs

For SPARK this means that the UCAN signature chain will be verified for each log line as it is received.

Fraud filtering pre evaluate

Multiple processes can mark peers or logs as fraudulent, in between measure and evaluate. For example, the Saturn Orchestrator can mark a peer as fraudulent when it fakes its speed test results.

Therefore, all logs that are part of the Honest logs buckets but have been flagged as fraudulent (or associated with a peer that has been flagged) will not be fed into the evaluation function.

Measure

In the measure step, we refer to each atomic item that gets measured as a job. For example, each retrieval served by a Saturn node is a job. For Spark, each retrieval made from an SP is a job. In order to proceed to the evaluate step, we need to gather together a set of the jobs and some summary statistics about the jobs, in one location for each evaluation epoch.

We have chosen to start the design with a centralized and trusted orchestrator, see for reasoning and alternatives.

TODO @Julian Gruber: Add fraud detection

Flow

At the end of each self-determined measurement epoch, peers upload their epoch logs to the orchestrator, which stores it in its DB.

The orchestrator periodically creates Merkle proofs of logs received - by creating a proof over all logs it has received since the last proof - and commits these on chain.

sequenceDiagram
  autonumber
  participant P as Peer
  participant O as Orchestrator
  participant D as DB
  participant M as Contract: Measure
  loop Upload
	  P->>O: Upload logs
	  O->>D: Store logs
  end
  loop Proof
	  D->>O: Fetch logs
	  O->>O: Create merkle proof
	  O->>M: Call with proof
	  M->>M: Emit proof
  end

Orchestrator

Similar to the Saturn network of L1s, each node sends all logs of jobs to a centralised orchestrator. This centralised entity provides good-enough guarantees for privacy, verification and fraud prevention.

Despite the centralisation, we can provide verifiability by attaching Merkle proofs over logs submitted to the on-chain commitments.

TODO @Julian Gruber:

For each step:

Push raw data into DB

Calculate commitment (merkle tree) and push on chain

The next step will use the raw data produced by the previous step

TODO : Update contract specs

💡

In order for the orchestrator to be able to call the measure contract, it needs to have access to a wallet (loaded with FIL). The simplest option is a simple wallet, at the security risk that access to the wallet gives you full access to the measure contract. Safer but more complex is to use a multisig.

Orchestrator vs Smart Contracts

The Orchestrator approach comes at the massive cost of introducing centralisation, and should be replaced with a decentralized smart contract setup eventually.

Before doing so, these challenges need to be solved:

How to protect peers from paying too many gas fees

How to protect the chains from having O(n) transactions (n=peer count) vs O(1) transaction per epoch.

Replacing the orchestrator with smart contracts (running everything on-chain) has significant advantages:

Trustless: There are no hidden parts of the system

Safe: No entity is responsible / liable for running the Orchestrator

Data Model

Job logs

Job logs are periodically submitted by the peer to the centralized orchestrator. This raw log of work done is useful for measurement, fraud detection and further analysis.

[
	// Generalized record
	{
	    "job_id": "<UUID or CID>", // unique job id
	    "peer_id": "<Libp2p Peer ID>", // Who completed the job
	    "started_at": "Timestamp", // when did the job begin
	    "ended_at": "Timestamp", // when did the job end
	    // any other fields that are useful measurements of work done
	}
	
	// Example Saturn record
	{
	    "job_id": "abcdef",
	    "peer_id": "<Libp2p Peer ID>",
	    "started_at": "2023-05-01 00:52:57.62+00",
	    "ended_at": "2023-05-01 00:52:58.62+00",
	    "num_bytes_sent": 240,
	    "request_duration_sec": 10,
	    "ttfb_ms": 35,
	    "status_code": 200,
	    "cache_hit": true
	}
	
	// Example SPARK record
	{
	    "job_id": "abcdef",
	    "peer_id": "<Libp2p Peer ID>",
	    "started_at": "2023-05-01 00:52:57.62+00",
	    "ended_at": "2023-05-01 00:52:58.62+00",
	    "status_code": 200,
	    "signature_chain": "<signature chain>",
	    "num_bytes": 200,
	    "ttfb_ms": 45 
	}
]

On-chain commitment

As part of the orchestrator receiving raw job logs, it submits a commitment of work done to the chain, via the measure smart contract.

The commitment consists of two parts:

A Merkle proof, for later data verification

Aggregation of measurements, for later evaluation

{
  "proof": [
    "<commitment hash>",
    "<adjacent node hash>",
    "<upper adjacent node hash>",
    "<merkle root>"
  ],
  "measurements": {
    "started_at": "2023-05-01 00:52:57.62+00",
    "ended_at": "2023-05-01 00:52:57.62+00",
    "jobs_completed": 13,
    // any other fields that are useful measurements of work done
  }
}

// Example Saturn aggregation
{
  "proof": [
    "<commitment hash>",
    "<adjacent node hash>",
    "<upper adjacent node hash>",
    "<merkle root>"
  ],
  "measurements": {
    "started_at": "2023-05-01 00:52:57.62+00",
    "ended_at": "2023-05-01 00:52:57.62+00",
    "jobs_completed": 13,
    "num_bytes": 1000,
  }
}

// Example SPARK aggregation
{
  "proof": [
    "<commitment hash>",
    "<adjacent node hash>",
    "<upper adjacent node hash>",
    "<merkle root>"
  ],
  "measurements": {
    "started_at": "2023-05-01 00:52:57.62+00",
    "ended_at": "2023-05-01 00:52:57.62+00",
    "jobs_completed": 13,
    "num_bytes": 1000,
  }
}

Visualization of Merkle proof:

flowchart BT
  style c1 stroke:yellow,stroke-width:4px
  style c2 stroke:yellow,stroke-width:4px
  style h2 stroke:yellow,stroke-width:4px
  style r stroke:yellow,stroke-width:4px
  l1(log l1)
  l2(l2)
  l3(l3)
  c1("commitment c1")
  l4(l4)
  l5(l5)
  c2(c2)
  l6(l6)
  l7(l7)
  c3(c3)
  h1("hash h1")
  h2(h2)
  r("merkle root r")
  l1-->c1
  l2-->c1
  l3-->c1
  l4-->c2
  l5-->c2
  l6-->c3
  l7-->c3
  c1-->h1
  c2-->h1
  h1-->r
  c3-->h2
  h2-->r
  p1(peer p1)-->l1
  p1-->l2
  p1-->l3
  p2(p2)-->l4
  p2-->l5
  p3(p3)-->l6
  p3-->l7

Evaluate

At the end of each payment epoch, the evaluate script converts aggregated measurements into a rolled up evaluation. Before triggering the reward phase by deploying the rewards factory contract, it asks for human review.

At this stage, we can tweak the evaluation function (including any extra fraud detection to run). Should there be a problem however, initial fraud detection and any other measurements have already been committed and can’t be adjusted any more (for this payment epoch).

The evaluation process consumes data from the chain, and publishes results to chain, but itself runs completely off chain, because the dataset is too large to be handled by smart contracts.

In order for the rewards factory contract to be callable even with a large size of peers, the evaluate script needs to batch its calls into dynamically sized buckets, given known smart contract size limitations.

The evaluation function doesn’t contain any secrets (it is independent of fraud detection), and therefore can be fully open sourced, which helps with trust and verifiability.

TODO:

: Proof (this is still an area of research)

sequenceDiagram
  autonumber
  participant SE as Script: Evaluate
  participant O as Orchestrator
  participant E as Contract: Rewards Factory
 
  SE->>O: Fetch aggregated measurements
  SE->>SE: Evaluate
  SE->>SE: Ask for review of evaluation results
  SE->>SE: Create batches of evaluations
  loop For each batch
    SE->>E: Call with batch of evaluations
  end

flowchart TB
  subgraph Aggregated measurements
    M1[Measurements]
    M2[Measurements]
    M3[Measurements]
    M4[Measurements]
    M5[Measurements]
    M6[Measurements]
  end
  subgraph Evaluations
	  M2--evaluate-->E1[Evaluation]
	  M3--evaluate-->E1
	  M4--evaluate-->E2[Evaluation]
	  M5--evaluate-->E2
  end
  E1--batch-->B[Batch of evaluations]
  E2--batch-->B
  B--"call contract"-->R[Rewards Factory]

💡

In a previous model, the contract was run periodically and the review happened during reward. This had the downside that course correction during review wasn’t possible, as the evaluations had already been committed to chain. Therefore, we evaluate only once, and allow human review before committing.

💡

While there is a review window between evaluate and reward, this is for the teams running the network, and not the peers. It was requested for peers to be able to review evaluations before they get paid out, in order to be able to object to them. Adding this will increase both complexity and cost of the system, and therefore has been moved to a later iteration.

At this point we have a table of logs in the above data model, stored in an off-chain data store. The next step is to evaluate over the logs based on the evaluation function.

Evaluation Function

In general, for $t$ evaluation fields, for node $i$ where $1 <= i <= n$ , and with logs $l_{i,1},...,l_{i,j},...l_{i,m_i}$ and evaluations on those logs $x_{l_{i,1},1},...,x_{l_{i,1},t},..., x_{l_{i,m_i},1},...,x_{l_{i,m_i},t}$ and evaluation function $f$ , we can calculate the evaluation output as

y := f(x_{l_{1,1},1},...,x_{l_{1,1},t},..., x_{l_{1,m_1},1},...,x_{l_{1,m_1},t}, ...,x_{l_{n,1},1},...,x_{l_{n,1},t},..., x_{l_{n,m_n},1},...,x_{l_{n,m_n},t})

where $y = (y_1,... y_n)$ and $y_k$ $y_i$ is the evaluation of node $k$ .

In the case of Saturn, the evaluation function is a function of number of bytes sent, TTFB and the request duration. This is calculated by the Saturn payouts system.

In the case of Spark, the evaluation function is simply a count of the number of successful requests with valid signature chains a Station has performed. Specifically, for node $i$ ,

y_k = \frac {\sum_{j=1}^{m_k} \delta_{kj} } {\sum_{i=1}^n\sum_{j=1}^{m_i} \delta_{ij}}

where $\delta_{ij} = 1$ if the log with index $j$ of node $i$ is valid and $0$ otherwise.

The Saturn evaluation function is more complicated . See https://hackmd.io/@cryptoecon/saturn-aliens/%2FMqxcRhVdSi2txAKW7pCh5Q for more details.

Once the evaluation function has finished off-chain, and results have been approved by humans, the evaluation results will be

Data Model

At the end of the evaluation step we should have an object that satisfies the following data model

{
  "epoch": "3",
	"measurement": "<cid-of-measurment>",
  "trace_of_eval": {
			...
  },
  "payees": [{
    "node_id": "1",
    "proportion": "0.4", // In above discussion, this is y_1
    "fraud": "0.1"       // Fraction of fraudulent logs, which have been discarged
  },{
    "node_id": "2",
    "proportion": "0.6", // In above discussion, this is y_2
    "fraud": "0"
  }],
}

I.e. for each epoch we know what proportion of the overall tokens will go to each node.

Reward

For each batch of evaluations submitted to the Rewards Factory contract, one Payment Splitter contract is deployed. Each Payment Splitter contract has inlined the reward shares. Each share can be claimed by the designated peer.

Flow

sequenceDiagram
  autonumber
  participant E as Script: Evaluate
  participant F as Contract: Rewards Factory
  participant S as Contract: Payment Splitter
  participant P as Peer
  E->>F: (Call with batches of evaluations,<br/>see Evaluate)
  loop For each batch
    F->>S: Deploy contract
  end
  P->>S: Claim FIL

flowchart TD
  subgraph Script: Evaluate
	  b1[Batch of evaluations]
	  b2[Batch of evaluations]
  end
  b1--call-->F[Contract: Rewards Factory]
  F--deploy-->S[Contract: Payment Splitter<br />- f1aaa: 1FIL<br />- f1bbb: 2FIL<br />...]
  subgraph Peers
    p1[Peer: f1aaa]
    p2[Peer: f2aaa]
  end
  p1--claim-->S
  p2--claim-->S

Pull vs push payments

It is important for peers to claim (pull) their rewards, instead of for the system to send out (push) rewards, for multiple reasons:

The system would have to pay the transaction gas fees

This gives peers the freedom to claim whenever they want (e.g. tax benefits), or not to claim at all

This further decouples the system from the system’s operator, which helps with liabilities

Pull payments require a balance

In order for a peer to claim their rewards, they will need to have enough balance in their wallets to pay the gas fees for calling the smart contract. This usually means sending some FIL onto your wallet before being able to claim rewards.

In order to help peers with initial rewards payment, an option to push payments could be added, where the smart contract will transfer out the reward, subtracting the gas fees it has to pay itself. While potentially convenient for the peer, this undoes the decoupling benefit of pull payments, plus adds complexity for gas estimation.

Reward contract balance

The reward contract needs to have a balance, in order for peers to be able to claim their FIL from it. Ideally, the clients of the system implementing Meridian will pay their service fees directly into the rewards contract. As long as this flow is not yet established, the team operating the service needs to manually top up its balance.

If a peer was flagged as fraudulent after evaluate and before reward, its revenue share is kept and moved to the next cycle’s reward pool.

Smart Contracts

Depending on how complex contracts turn out to be, hire contractors or write ourselves. Our current thinking is that contract work will be simple enough (either because contracts are simple or existing contracts can be reused), that we would prefer to write the contracts ourselves. This puts is more in control, and alleviates timelines / cross team orchestration.

Independent of which team creates the contracts, audits will be required anyway.

Specs

Measurement Contract Spec (1)

Evaluator Contract Spec (1)

Reward Contract Spec (1)

Testing

Testing will depend on the choice of framework we use to develop the smart contracts. The recommendation is that we proceed with foundry because it has built in invariance testing and auto generates rust bindings for the contracts which are useful for integration tests.

Smart Contract Testing

We can write some unit tests in solidity that test basic contract functions such as claiming a payout, etc. With foundry we can write these tests in solidity.

Invariance testing. This is built in foundry or can be done with a separate library such as echidna. This implements fuzz testing to the contract as a whole.

Static analysis. There are already established analysis tools for EVM smart contracts and we should use them.

One caveat with testing FVM smart contracts is that if we want to use filecoin specific features (eg. Filecoin addresses) in our contracts then we would be relying on using filecoin pre-compiles and that will break a lot of testing libraries. A naive solution could be to maintain two versions of each contract. The Saturn team also started working on a local FVM test executor written in Rust that allows to run unit tests on solidity smart contracts that use filecoin precompiles. This executor is still rudimentary and needs improvement to be a reliable testing tool.

Unit Tests

Each component should have unit tests to make sure functionality is working. For example we should have extensive unit tests for: log commitment scheme, evaluation functions, etc.

Integration Tests

If we have bindings for the contracts, we can easily write integration tests for some end to end flows that run on calibration net. This just requires a burner wallet with some test fil in it. Saturn already has examples of this.

Auditing

After we complete our smart contracts, we should have them audited and publish the audit publicly.

Observability

Take inspiration from the Saturn internal dashboard

Create a generalized dashboard template for all Meridian systems

Roadmap (outdated, ignore)

TODO: Add walking skeleton, when to hand over to contractors

TODO: ⚠️ This roadmap is outdated, please skip ⚠️

Sync notes: perform airdrops as we develop walking skeleton. Don’t schedule airdrops, drop them as we want to perform tests.

Or would it be easier to run this in simulation?

When shall we build a testnet?

Figure out legal situation

Secure funding

Finish design

Build walking skeleton
1. 3 contracts
  1. measure
  1. evaluate
  1. reward
1. orchestrator

Hand over to contractors

Build staging environment in testnet

Optional airdrops on mainnet with finished contracts

Spark Airdrops
1. Airdrop I
  At 5M retrievals
  ~~Measure Evaluate~~ Reward
  1. Deploy splitter contract that rewards pot of FIL one time
  1. Submit simulated evaluate results, evaluating to 1/network_size per peer
1. Airdrop II
  At 10M retrievals
  ~~Measure~~ Evaluate Reward
  1. Reuse reward contract with fresh pot of FIL
  1. Deploy evaluate contract that maps every measurement to 1/network_size
  1. Submit simulated measurements. Per wallet known, submit 1
1. Airdrop III
  At 20M retrievals
  Measure Evaluate Reward
  1. Reuse reward contract with fresh pot of FIL
  1. Deploy new evaluate contract that maps every measurement to retrieval_count/network_retrieval_count
  1. Measure
    Deploy measurement contract, submitting retrieval_count per epoch
    Ensure https://github.com/Zondax/filecoin-signing-tools/tree/master runs on Zinnia
    Figure out how to get initial gas fees
    Announce membership from Zinnia
    Submit measurements from Zinnia

Stream logs into publicly available database (read public, write public, delete private)

Commit logs on chain using smart contract

Measure using smart contract

Detect fraud based on measurements and centralized log storage

Evaluate measurements using smart contract

Reward based on evaluation results using smart contract