Stress test 3 results
Starting at block 50868 on July 9 at 14:11 (UTC), Tari core developer
@Hansieodendaaal bombarded testnet with over 38000 transactions.
A full Tari block can currently accommodate around 650 transactions, so this represented a significant flood of the
network.
Ultimately, here's what happened:
| | TX | % |
|:---------------------------------------------------------|-------:|-----:|
| Transactions sent | 38,160 | |
| Transactions successfully broadcast | 32,936 | 86% |
| Transactions mined (out of total) | 32,922 | 86% |
| Transactions mined (inc retries, out of those broadcast) | 32,922 | 100% |
Here are the key takeaways:
- Of the transactions that were broadcast, nearly all of them were ultimately mined. Our accounting here is not perfect
-- a full audit of the transaction list will take longer, but it looks like it may well have been 100%.
- 14% of the transactions did not get broadcast. This meant they were stuck in "pending" mode somewhere. The core devs
are still investigating what happened, but it looks like it may have been because one of the receiving wallets
was not reachable at any point during the test. This may explain the majority of the transaction failures, but it's
still speculation at this point.
- 1,500 transactions were only broadcast and mined at the "second bite of the cherry", once the sending wallets decided
to retry the transactions. This indicates that the peer-to-peer process is not perfect yet, but is reasonably
eventually consistent.
Overall this represents a significant improvement over the last stress test. A quick trip down memory lane:
- The first stress test was a disaster. A bug in the message deduplication code resulted in the test essentially DDOSing
the network. Only 5% of transactions were mined in that test.
- The second test went much better; there was no inadvertent DDOSing, but the test highlighted areas in the code that
blocked up the main execution thread and prevented efficient message handling. Default message buffers were also much
too small, resulting in many dropped messages. In that test, the majority of transactions were broadcast, but only 25%
were eventually mined.
- This test had 86% of transactions broadcast, and all of those were mined.
The community is still poring over the GB of logs produced in the tests, but some early themes for improving the results
for the next test are emerging:
- If a node is running, but cannot talk to the network for some reason, it should make more noise. This is particularly
true of the seed nodes, that Aurora clients rely on to communicate with the network.
- Buffer sizes can be further tweaked to reduce bottlenecks.
- Find out where transactions are getting stuck in the signing protocol to move that 86% towards 100%.
A more detailed analysis was posted on IRC.