Investigation: Spark RSR drop on 2025-02-28
On February 28th, around midday UTC, there was a significat change in Spark RSR:
- The overall RSR (Graphsync+HTTP retrievals) converted with HTTP-only retrievals.
- HTTP-RSR slightly increased - but this could be just a usual noise in our data.
- Overal RSR significantly dropped from ~22% to 13%
Relevant charts from the Spark Dashboard
The rest of this document is structured in two parts:
- The first part contains summary about what we find.
- The second part - - provides detailed description of our investigation process.
Relevant findings
What happened:
- As part of deploying new component measuring the retrieval success rate for HTTP HEAD requests, we introduced a stricter validation step rejecting successful measurements missing the response status code for the HEAD request. We did not realise this check was too strict and rejected valid measurements for Graphsync retrievals.
- As a result, no data was recorded for SPs serving Graphsync retrievals only. (See e.g. f03252730 and notice that both
totalandsuccessfulare set to0.)
What mitigations we implemented:
- We improved our dashboard to better signal when the validation step starts rejecting more measurements than usual.
- We added an alert to let us know when this happens, so that we can investigate and fix the problem in timely manner.
Pull requests:
The updated chart with an alert:
Investigation
Charts from the Internal Spark Dashboard:
It’s probably not relevant to this issue, but IPNI service has been degraded since since Feb 15th, see Spark Internal Dashboard >> IPNI Success Rate
Sample spot check
Miner: f03252730 - serving Graphsync retrievals only
Non-zero RSR on Feb 28th; dropped to zero on March 1st.
Miner details (obtained using https://gist.github.com/bajtos/d10cfc39f60ed8fe5a7578f416df530c and https://crates.io/crates/libp2p-lookup):
- On-chain PeerID:
12D3KooWSv7uy3a9RDGYiaVKx8v65oGg9AcTpjMsjFoxKHmQ9SVx
- Address:
'/ip4/210.209.77.162/tcp/17033'
- Agent version:
boost-2.4.0+mainnet+git.390148b8
I did several manual checks for this miner and all passed from my machine.
round payload_cid
32659 bafykbzaceclqej3k65xu5fdtiovuumo5dr67k6djyjcrngve7kcayxi5v6tu2
32659 bafykbzaceacm5i7zlia7eis2zcqjmjedh47ptloam6opbfgyggpmhfdesv7ty
32658 bafykbzacedznjufre7d34fj44aq66zynmuoam6vt47vulirpnfq25e5y752nc
32654 bafykbzacebbvux22ty7ic7dvop242p5xemmzj3dpmi2deenpmpff5caoh4t32
32653 bafykbzacebn7ilw4lud2iu3dnf4bxttpkhuxosmvvauad6mcgynwwstbe4ilg
32653 bafykbzacedc4krbatg5pybzqqeke5dggsjk2wiyr6aypdymdl6maqrtus2i6q
32650 bafykbzacedzuuvtoq6pjcm3xq6m4b5iguvctdexfxipdayl6abdm7odabpofa
32650 bafykbzacebvbv37ah6ax4pnpwy4hgtofuvj6wt5w7elnbunwr7jp7mpwd6d7o
32650 bafykbzacebev4ywbr7ipl2ilp75mxg3ub53lhflrss7flnvoaixwkyiwy3xhe
32650 bafykbzacea4qmft3apqol2ely7mpibvztafsx4irhzemt3wsdiu3iewva4adiMiner: f03173127 - serving HTTP retrievals
Their RSR slightly decreased, but we account this decrease to the current IPNI service degradation.





