Known Attack Vector Analysis

In this doc, we will introduce and analyse the known attack vectors of the Spark protocol. We will then discuss our mitigations for each one.

Checker Based Attacks

Attack: Failure-only Checking

Description

The attacker forks the spark checker code. Instead of performing each task correctly, they instead always report an error case for their measurement.

Qui Bono

In this case, the Spark checker benefits because they do not have to spend bandwidth or compute cycles to perform each retrieval attempt.

Checkers are still rewarded for error cases if they are in the honest majority and so they can earn for doing no work, while also skewing the true retrieval success rate scores of the SPs.

Existing Mitigations

  • Honest Majority Committee Consensus: Honest majority consensus on retrieval tasks means that the attacker will not be rewarded if the majority of checkers who perform the same tasks are honest and do the task properly. However, since the checker rewards are very small, this is unlikely to deter someone who is attacking Spark and Filecoin for larger economic reasons, for example, to affect datacap allocation decisions. Also, since the Spark RSR score is still based on individual measurements and not the consensus outcome, this attack can still bring down the RSR score.
  • IPV4 /24 subnet restrictions means that they can only do this with one spark checker in each subnet.

Potential Further Mitigations

  • A reputation system based on how regularly a Spark checker, identified by its Station ID, is in the honest majority for the tasks it performs. This reputation system can be used to weight the results of individual Spark Checkers in creating the Spark RSR score. If a Spark checker is consistently not in the honest majority, then their measurement will negligibly impact the overall Spark RSR score.

Attack: Lazy self-interested checking

Description

The attacker forks the spark checker code. Instead of performing each task, it waits until it is offered one that it is interested in. For example, the checker is in cahoots with an SP and so only performs retrievals for deals linked to that one SP. They may go further and not even bother to make the retrieval request and instead always say the data is retrievable.

On top of this, since currently the IPV4 /24 restriction limits each IP subnet to k measurements per round, these k tasks can all be for the same task t

Qui Bono

In this case, the SP who is involved benefits from being measured while other SPs are not measured. The checker node has no requirements to perform the other tasks assigned to it in each round and so will only earn for what they do.

Existing Mitigations

  • So long as there are enough honest checkers in the network, it shouldn’t be an issue if certain checkers are selective. Other honest checkers will fill in the gaps.
  • Honest Majority Committee Consensus: If the checker is always measuring an SP favourably and the SP is not serving the data then this should be caught in the honest majority consensus. The checker will not be rewarded. However, as mentioned above, since the checker rewards are very small, this is unlikely to deter someone who is attacking Spark and Filecoin for larger economic reasons, for example, to affect datacap allocation decisions. Also, since the Spark RSR score is still based on individual measurements and not the consensus outcome, this attack can increase the RSR score of a bad SP.
  • IPV4 /24 subnet restrictions means that they can only do this with one spark checker in each subnet.

Potential Further Mitigations

  • Checker impartiality logic: There could be some logic in Spark evaluate that explores whether or not certain checkers are only reporting favourably, or at all, when the tasks are linked to specific SPs. Checkers could get marked higher if their measurements are across a wide range of miners.
  • We could introduce a requirement to perform your k closest tasks in the correct order of closeness so that you cant choose which to do first and then ignore the rest.
  • A reputation system based on how regularly a Spark checker, identified by its Station ID, is in the honest majority for the tasks it performs. This reputation system can then be used to weight the results of individual Spark Checkers in creating the Spark RSR score. If a Spark checker is consistently not in the honest majority, then their measurement will negligibly impact the Spark RSR score.

Attack: Lazy Checker report based on SPs previous RSR score

Description

A checker looks at the miner Id in a task and instead of performing the retrieval, they report based on the existing Spark RSR score of the miner Id. For example, if an SP has a 30 day trailing Spark RSR score of 0.995, then the checker can bet on it being a successful retrieval. This is essentially a more advanced version of the Failure-only Checking Attack.

Qui Bono

In this case, the Spark checker benefits because they do not have to spend bandwidth or compute cycles to perform each retrieval attempt and are more likely to get into the honest majority.

Potential Further Mitigations

  • Weighted rewards for unexpected outcomes. We could reward checkers more generously or give them lots of reputation for being part of a committee that creates an unexpected honest result.

Attack: Checkers near each other on the hash ring

Description

In the tasking phase of the Spark protocol, in each round, Checkers are assigned the k “closest” tasks to perform based on the distance between the Sha256 hash of the retrieval task plus the round’s drand value and the hash of the checker’s ID (Station Id).

In a more advanced version of the Lazy self-interested checking attack, an attacker can spin up hundreds of Stations such that the SHA256 hashes the of Station Ids are “close” enough to each other in the SHA 256 hashring, so that they will be given similar tasks from the Round Retrieval Task List.

When each of the attacker’s checkers calculates which k tasks it should perform in the round from the overall Round Retrieval Task List, the attacker then looks at whether a particular tasks is allowed to be tested by say 50 of its checkers, i.e. enough checkers to control the honest majority consensus. The attacker then submit whatever result they want to for this task.

Qui Bono

In this case, the attacker can not only affect the outcome of the Spark RSR score but also the honest majority decision. However, the attacker still cannot choose which tasks are going to be the ones near to all their checkers due to the inclusion of drand in the taskKey. This means that even thought they can potentially overwhelm a majority, it may not be for a task that is of interest to them.

Existing Mitigations

  • IPV4 /24 Subnet restriction: The attacker will need to run each of the checkers that are close to each other in the hash ring on a different IPV4 /24 subnet.

Potential Further Mitigations

  • Drand in nodeKey: We could hash the drand value into the nodeKey in each round as well as into the taskKey. This will mean that the Station Ids that are close to each other on the hash ring will now be spread all around the hash ring, making it harder for the attacker to orchestrate. However, there is currently no friction in generating Station Ids, so the attacker can simply generate checker that are near each other at the start of each round. A reputation system or an incentive to have long-lasting Station Ids will help here.

Attack: Checker Spawning

Description

In the tasking phase of the Spark protocol, in each round, Checkers are assigned the k “closest” tasks to perform based on the distance between the Sha256 hash of the retrieval task plus the round’s drand value and the hash of the checker’s ID (Station Id).

In a slight variant of the “Checkers near each other on the hash ring” attack, instead of spinning up lots of Stations in advance, the attacker waits for a task that is important to them and creates the taskKey with the round’s drand value. They then immediately spin up as many Station Id’s as they can that hash to a value close to the taskKey. They then submit measurements for their chosen task with the new Station Ids.

Qui Bono

In this case, the attacker can influence the measurements of specific tasks. In combination with the “Checkers near each other on the hash ring” attack they may be able to influence the committee consensus result for a specific task.

Existing Mitigations

  • IPV4 /24 Subnet restriction: The attacker will need to run each of the checkers that are close to each other in the hash ring on a different IPV4 /24 subnet.

Potential Further Mitigations

  • Drand in nodeKey doesn’t help here: adding drand value into the nodeKey in each round as well as into the taskKey does not help here as the attacker is creating the Station Ids with knowledge of the round’s drand value.
  • A reputation system or more generally an incentive to have long-lasting Station Ids will help here.

Attack: Checkers all around the hash ring

Description

In the tasking phase of the Spark protocol, in each round, Checkers are assigned the k “closest” tasks to perform based on the distance between the Sha256 hash of the retrieval task plus the round’s drand value and the hash of the checker’s ID (Station Id).

In a more advanced version of the previous attack vector, an attacker can spin up thousands of Stations such that the SHA256 hashes the of Station Ids are spread all around the hash ring, so that they can get hold of all the tasks in the Round Retrieval Task List and hope to form majorities for certain or all tasks.

This attack is essentially saying that someone is going to dominate the Station Network.

Qui Bono

In this case, the attacker can not only affect the outcome of the Spark RSR score but also the honest majority decision. The attacker can also choose which tasks from the round it wishes to perform and which to ignore.

Existing Mitigations

  • IPV4 /24 Subnet restriction: The attacker will need to run checkers on many different IPV4 /24 subnets.

Potential Further Mitigations

  • Adding drand to the nodeKey does not help here like it did in the previous one as this will just redistribute the hashes around the hash ring but they are likely to be very evenly distributed.
  • The defence here is simply having enough other stations running in the network such that it becomes hard to overwhelm majorities
  • The reputation system will also help here where long standing Station Ids are those who are predominantly in the honest majority are weighted more heavily than others.

Provider Based Attacks

Attack: Provider only stores the root block

Description

In the current version of the Spark protocol, the checkers only ask the provider for the root block of each payload. A malicious Provider could only store the root block of each file unsealed and still pass all the Spark checks.

Qui Bono

The Storage Provider benefits by saving storage space.

Existing Mitigations

  • There are no existing mitigations

Potential Further Mitigations

  • Full file retrieval: We could switch full file retrieval back on. However, this is likely to load test the providers and lead to a bad sentiment around Spark.
  • Range requests: Instead, we could do range requests to retrieve specific blocks from the whole file, to build up a probabilistic argument about whether or not the entire file is retrievable. I.e. we ask to retrieve a file and an byte offset as well as the required blocks to prove that the offset block accumulates into the root payload CID.

Attack: Proxy Provider

Description

The provider doesn’t itself store a hot copy of files but instead proxies retrieval requests on to another node that is storing a hot copy, where they have some sort of out-of-protocol agreement that they will work together in this way.

Qui Bono

The Storage Provider benefits from saving storage space. The provider that stores all the hot copies will likely earn from the proxying providers and will benefit from economies of scale.

Existing Mitigations

  • There are no existing mitigations and there will likely never be. It is impossible for us to prove that a provider is storing a hot copy on a specific piece of hardware without more beefy proof protocols like PoRep.
  • We also don’t feel that this should be viewed as an attack vector. Spark is testing that the provider, or should we say the network, is able to serve the content from a deal. Furthermore, content addressing is about the network’s ability to serve content, not about the ability to fetch it from a specific location.

Potential Further Mitigations

  • None

See also

  • https://github.com/ipni/specs/issues/33#issuecomment-2461328991
    1. If you don't want a miner you're testing to be able to serve requests cheaply by on-the-fly fetching from some other provider -> you seem out of luck, this is why Filecoin does PoRep with all it's associated tradeoffs
    1. If you don't want a miner you're testing to be able to serve requests cheaply on-the-fly fetching from some other provider unless they pay some penalty (e.g. they proxy all the bytes through an endpoint they control) -> just make sure the data transfer protocol is authenticated (e.g. if using HTTP you can require something like libp2p over HTTP's PeerID auth or some other auth scheme run on the same domain)

    Another comment from that thread:

    Aside from one Curio instance being able to back many miners with different IDs, if Spark's goal is to figure out which SPs are serving data well I can do the following:
    1. look at the FIL+ requirements and see that many require storing with multiple SPs
    1. only take on clients with FIL+ looking to store with multiple SPs
    1. if another SP is serving the data then I can cheat and just HTTP redirect to them and/or be a proxy for them and then I don't have to actually do any of the things that I might if I was serving the data (e.g. indexing, storing the unsealed data, etc.) but still look good on the Spark metrics
      • This means that as an SP I can trick clients, FIL+ allocators, etc. who rely on Spark's data into thinking I'm a provider that behaves well at serving data when in reality I don't do that at all

Attack: Provider advertises incorrectly payload CIDs to IPNI in Spark v2

todo: fill in details

Attack: New Spark requests are likely to come in at the start of Spark rounds and not in the middle or second half of the round