Breaking CashFusion With Pen & Paper (But Only Sometimes)
Paul Sztorc
11 December 2020
CashFusion is a proposed mixing scheme for BCH. Roger Ver has taken up the battle-cry of “more potential combinations than there are atoms in the universe” (see here and here).
There was a little bit of talk on this on the bitcoin-devs mailing list, and it attracted my curiosity.
So in April I sunk an afternoon into investigating this matter. I read (this article by Aaron van Wirdum), and spent a day typing up this mini-post – I hope you enjoy.
I’d hate for people to read too much into this, as it is one of my lowest-effort posts of all time. (Hence the delay in publishing it.)
The source of that “atoms in the universe” line is this calculation, which involves Stirling numbers of the second kind.
I will call this the “Stirling Heuristic”.
Does it always accurate describe CashFusion? I’m afraid not.
Many of the “possible ways” (of combining inputs into outputs), are in fact not possible, because the output-values won’t add up.
Ie, something like…
In Out
8000 12000
9000 5000
2 0.9
4 1
1
3.1
…would have 266 possibilities according to the Stirling Heuristic. But in reality, there is only one possibility: 8000 and 9000 merged into 17000 then split into 1200 5000; 2 merged into 2 and then split into 1 and 1; 4 merged into 4 and then split into 0.9 and 3.1.
The Stirling Heuristic would apply perfectly, if the situation looked like this…
In Out
8000 ?
9000 ?
2 ?
4 ?
?
?
…and furthermore if it were the case that every combination of left-?s could add up to every combination of right-?s.
But of course that is a little silly.
With extremely bad luck, the Striling Heuristic is 100% misleading, as instead of being “many possible comibinations” there is just one possible combination.
Part of the pitch for CF is that, even if a valid partition is discovered, there could be “many valid partitions”.
For example, with hundreds of inputs and outputs,
it is not just computationally impractical to iterate
through all partitions, but even with infinite
computing power, one would find a large number of
valid partitions.
“Many valid partitions” means that even if the attacker discovers a way to group the inputs into outputs, they will not know if they have found the way that was actually used.
However, with what I am about to show, they will know they’ve found it. Furthermore, they can prove that they’ve found “the” single way.
Here’s a small CashFusion (6 in, 5 out), to warm up:
In: Out:
2. 10.
6.3 10.2
8.4 10.3
9.2 10.4
11. 10.41
14.41
First, let us label the inputs (A through F), and outputs (i through v).
We will begin with the right-most digit.
For both the Inputs and Outputs, I will check and see which types of digit are achievable (via addition).
On the input side, if is the case that we MUST use a given input to reach a given digit, then both that digit and the special input(s) will be starred. On the output side, if is the case that we MUST send a given digit to a SINGLE output, then both that digit and the associated output will be starred.
In Out
A 02.00 i 10.00
B 06.30 ii 10.20
C 08.40 iii 10.30
D 09.20 iv 10.40
E 11.00 v 10.41*
F 14.41* =========
======= 0
0 1*
1*
Whenever we have a match (“1*” matching “1*”, in this case), we know that the starred-input MUST be a part of the starred-output.
We know that F --> v
Since we’ve nailed down that input, we delete it and start over. We also subtract the value of F from v.
In Out
A 02.00 i 10.00
B 06.30 ii 10.20
C 08.40 iii 10.30
D 09.20 iv 10.40
E 11.00 v -4.00
======= =========
0 0
If no asterisks, we delete the right-most digit, and travel leftward to the next one.
In Out
A 02.0 i 10.0
B 06.3* ii 10.2*
C 08.4* iii 10.3*
D 09.2* iv 10.4*
E 11.0 v -4.0
====== ========
0 0
3* 3*
4* 4*
2* 2*
7* --
5* --
6* 6
9* 9
You can see how this starts to get useful quickly. How could B not go to output iii? Output iii ends in “.3”, but there just isn’t any way to get to “.3” without using input B. Similarly, we can’t get to “.2” without using input D. Nor can we get to iv without using C.
We know that B --> iii
We know that D --> ii
We know that C --> iv
Delete and subtract…
In Out
A 02.0 i 10.0
E 11.0 ii 1.0
====== iii 4.0
0 iv 2.0
v -4.0
========
0
We are done with this digit. Move leftward…
In Out
A 02* i 10
E 11* ii 1*
==== iii 4
2* iv 2*
1* v -4
======
0
1*
2*
Thus, we know that A must be linked to iv, and E must be linked to ii.
We know that A --> iv
We know that E --> ii
Delete and subtract…
Out
i 10
ii -10
iii 4
iv 0
v -4
======
So, we now know everything:
We know that A --> iv
We know that B --> iii
We know that C --> iv
We know that D --> ii
We know that E --> ii
We know that F --> v
i and ii have the same owner: the owner of D,E
iii and v have the same owner: the owner of B,F
iv is owned by: the owner of A,C
Theoretically, this CashFusion had…
> Sum[StirlingS2[6, x] StirlingS2[5, x], {x, 1, 5}]
3381
…3381 “possible combinations of input and output”. But we can see easily that, in reality there is only ONE possible combination. And everyone can not only learn it, but prove it!
That was just a small example. What about when it gets bigger?
After all, the CashFusion spec says:
In CashFusion, we have opted to abandon the
equal-amount concept altogether. While this
is at first glance no different than the old
naive schemes, mathematical analysis shows
it in fact becomes highly private by simply
increasing the numbers of inputs and outputs.
If you followed the logic above, you may already see that we could add more inputs with certain “unlucky” last-digits…
In:
02.00
06.30
08.40
09.20
11.00
14.41
+ 14.42 +
+ 14.44 +
+ 14.48 +
Note: 0 is a dead ("non-unique") digit already, 1 is already
unique. 2 can be made unique. 3 is dead because it
could be reached by 1+. 4 is alive because so far we
can only produce 0,1,2,(3). 5 is dead because we could
make it with 1+4. 6 can be made with 2+4, so it is dead. Is 7 alive? Unfortunately, it is not, because
so it via 1+2+4. Thus, so far we can make: 0,1,2,(3),4,(5),(6),(7). That means 8 is alive. Notice
that 8 further deadens 0, because 8+2=0 -- but it
doesn't matter because zero was already dead.
8+1 = 9, therefore 9 is dead.
Out
10.00
10.20 + 14.42 = 24.62
10.30 + 14.44 = 24.74
10.40
10.41 + 14.48 = 24.89
…without gaining a shred of additional privacy. These new inputs all have unique right-most digits, so they are all immediately eliminated in the first two rounds of the right-digit-algorithm.
Watch:
Round One
In Out
A 02.00 i 10.00
B 06.30 ii 24.62
C 08.40 iii 24.74
D 09.20 iv 10.40
E 11.00 v 24.89*
F 14.41* =========
G 14.42* 0
H 14.44* 1* -- can only be sent to "9"
I 14.48* 2
======= 4 -- could be sent to "4" or "9"
0 8* -- can only be sent to "9"
1* -- can only be obtained from F
2* -- can only be obtained from G
4* -- can only be obtained from H
8* -- can only be obtained from I
Round Two
In Out
A 02.00 i 10.00
B 06.30 ii 24.62*
C 08.40 iii 24.74*
D 09.20 iv 10.40
E 11.00 v -4.00
------- =========
G 14.42* 0
H 14.44* 2 -- can only be sent to "2"
------- 4 -- can only be sent to "4"
=======
0
2*
4*
Round Three
In Out
A 02.00 i 10.00
B 06.30 ii 10.20
C 08.40 iii 10.30
D 09.20 iv 10.40
E 11.00 v -4.00
------- =========
------- 0
-------
-------
=======
0
And you see that the right-digit algorithm has already reduced this larger problem back to the original smaller problem.
So, the solution is:
We know that A --> iv
We know that B --> iii
We know that C --> iv
We know that D --> ii
We know that E --> ii
We know that F --> v
We know that G --> ii
We know that H --> iii
We know that I --> v
i and ii have the same owner: the owner of D,E,G
iii and v have the same owner: the owner of B,F,H,I
iv is owned by: the owner of A,C,
Our Modified “Medium Example 1” CashFusion, added three inputs.
Supposedly, these three extra inputs should open the door to many more combinations. Instead of 3381 combinations, there should be…
> Sum[StirlingS2[(6+3), x] StirlingS2[5, x], {x, 1, 5}]
164,102
…164,102 “possible combinations of input and output”. But, just as with the first toy example, in truth there was only ONE possible combination. Everyone can learn it and prove it, using simple arithmetic.
Hopefully, by now it is clear that I can make CashFusion look even worse, with a more contrived example that has more “bad luck that is right-digit based”.
For example, what if we just take Example 2, copy the inputs and outputs, lower them by four orders of magnitude, and add them back into the problem.
This is a little silly, but I also think it makes for a good explanation.
In:
A 02.00
B 06.30
C 08.40
D 09.20
E 11.00
F 14.41
G 14.42
H 14.44
I 14.48
+ J .0002,00
K .0006,30
L .0008,40
M .0009,20
N .0011,00
O .0014,41
P .0014,42
Q .0014,44
R .0014,48
Out
i 10.00
ii 24.62
iii 24.74
iv 10.40
v 24.89
+ vi .001000
vii .002462
viii .002474
ix .001040
x .002489
I hope it is obvious that the right-digit algorithm I’ve outlined, will solve the rightmost “half” of the problem immediately, before it even notices the “left half”.
With the right half out of the way, it solves the left half, exactly as easily as it did before.
The “higher number of inputs and outputs” had absolutely zero significant effect on how long it takes to solve the algorithm. Instead of taking 1/100th of a second, it might take 2/100ths of a second.
So again, with trivial arithmetic, the entire problem is solved.
We know that A --> iv
We know that B --> iii
We know that C --> iv
We know that D --> ii
We know that E --> ii
We know that F --> v
We know that G --> ii
We know that H --> iii
We know that I --> v
---------------------
We know that J --> ix
We know that K --> viii
We know that L --> ix
We know that M --> vii
We know that N --> vii
We know that O --> x
We know that P --> vii
We know that Q --> viii
We know that R --> x
i and ii have the same owner: the owner of D,E,G
iii and v have the same owner: the owner of B,F,H,I
iv is owned by: the owner of A,C,
vi and vii have the same owner: the owner of M,N,P
viii and x have the same owner: the owner of K,O,Q,R
ix is owned by: the owner of J,L,
Now we will get a very good look at the treacherous nature of the Stirling Heuristic. Because Example 3 added many inputs and outputs, but all of it amounted to only 2x additional attacker compute time.
Let us see what the heuristic reports, now that we have doubled the inputs to 18 and outputs to 10.
> Sum[StirlingS2[18, x] StirlingS2[10, x], {x, 1, 10}]
5,161,826,943,804,663
So, we have gone from a claimed “3381 combinations”, to a claimed “164,102 combinations”, to a claimed “over 5.16 quadrillion” combinations.
But all of these claims are false, because in all three cases there was never a need to check all of these combinations exhaustively. In fact, there was never a need to check a single combination.
The examples above are all contrived – I made them up to show the unreliability of the Stirling Heuristic.
Now I will try again with a small more realistic example. To CashFusion’s credit, I found this impossible to do by hand, and even wrote a tiny computer program to help. Even then, I failed to completely crack it after about 45 minutes of trying. But I still guess that a specialized computer program would crack it easily.
Here, in the fourth and final example, all of the digits were taken from a uniformly random distribution. (I took random digits from random.org.) I did change some of the leftmost input-digits to 8 to intentionally make them more similar to each other, so as to make my job harder. Then I randomly added them together.
In: Out:
A 8.4809,2042 i 6.5004,3459
B 6.5831,7609 ii 9.2833,7262
C 4.3501,4451 iii 15.2334,3000
D 3.3066,6320 iv 18.3707,5971
E 8.7799,3110
F 7.1220,0016
G 8.6618,0705
H 2.1033,5439
Round One
In: Out:
A 8.4809,2042 i 6.5004,3459
B 6.5831,7609 ii 9.2833,7262
C 4.3501,4451 iii 15.2334,3000
D 3.3066,6320 iv 18.3707,5971
E 8.7799,3110 =================
F 7.1220,0016 9
G 8.6618,0705 2
H 2.1033,5439 0
============= 1
0 -- D, E, B+C
1 -- C, H+A, F+G
2 -- A, C+F+G
3 -- A+C, or B+G
4 -- G, A+B+H
5 -- C+G, B+F
6 -- A+G, F
7 -- C+F, C+A+G, A+B+F
8 -- A+F, C+G+H
9 -- B, or A+C+F
We are defeated this round, and must move leftward.
Round Two
In: Out:
A 8.4809,2042 i 6.5004,3459
B 6.5831,7609 ii 9.2833,7262
C 4.3501,4451 iii 15.2334,3000
D 3.3066,6320 iv 18.3707,5971
E 8.7799,3110 =================
F 7.1220,0016 59 -- must go to i
G 8.6618,0705 62 -- must go to ii
H 2.1033,5439 00 -- must go to iii
============= 71 -- must go to iv
...
59 -- can't be reached without 20, nor 39
62 -- can't be reached without 42
00 -- can't be reached without 10, nor 39
71 -- can't be reached without 20
I wrote a tiny program to compute the various ways that one could add up the two rightmost digits of the 8 inputs.
The program did NOT do any other obvious things. One such thing would have been to, while adding up these digits, also add up the entire value to check and see if it is implausibly large. For example, my program found a way to achieve the digits “71”, by adding 42,9,51,20,10, and 39, to make “171”. But that cannot possibly be the real way that we get to the “71” of output iv, because the other three outputs have nearly 30 Bitcoin in them total, and only remaining inputs (F and G) don’t add up to anywhere close to that.
Also, to save space, I’ve only listed on the left, the digits that would have appeared on the right.
The inputs of 20 and 39 are both being double-claimed. That seems to be a contradiction, so I’m not really sure how to interpret that. Perhaps it indicates the presence of ‘other valid partitions’. After all, 20 (D) can’t go to both 59 (i) and 71 (iv).
I selected the link between 10 (E) and 00 (iii), to see if it could shed light on 20 and 39…
Round Three
In: Out:
A 8.4809,2042 i 6.5004,3459
B 6.5831,7609 ii 9.2833,7262
C 4.3501,4451 iii 6.4534,9890
D 3.3066,6320 iv 18.3707,5971
E ----------- =================
F 7.1220,0016 59 -- must go to i
G 8.6618,0705 62 -- must go to ii
H 2.1033,5439 90 -- must go to iii
============= 71 -- must go to iv
...
59 -- can't be reached if 20, 39 are removed
62 -- can't be reached if 42 is removed
90 -- can't be reached if 39 is removed
71 -- can't be reached if 20 is removed
…but it shed no additional light on anything.
At this point I’m pretty confused about the persistant dual-appearance of 20 and 39, so I checked my Answer Key (after all, I made this problem). 39 is supposed to go to iii (ie, “90”). I also checked and 20 is supposed to go to 71. Perhaps it is a general rule that, a finding with 2 contradictions should always lose to 1 contradiction. Or perhaps not.
I’m doing this for fun and I’m already getting very bored, so I’ll just do the 39 output…
Round Four
In: Out:
A 8.4809,2042 i 6.5004,3459
B 6.5831,7609 ii 9.2833,7262
C 4.3501,4451 iii 4.3501,4451
D 3.3066,6320 iv 18.3707,5971
E ----------- =================
F 7.1220,0016 59 -- must go to i
G 8.6618,0705 62 -- must go to ii
H ----------- 51 -- must go to iii
============= 71 -- must go to iv
...
59 -- can't be reached if any of 42, 9, 51, 20, 16, 5 are removed
62 -- can't be reached if any of 42, 20, are removed
51 -- can be reached (no matter what is removed)
71 -- can't be reached if 20 is removed
…after which point, there is least now a compelling logical reason to assign 20 to 71.
Round Five
In: Out:
A 8.4809,2042 i 6.5004,3459
B 6.5831,7609 ii 9.2833,7262
C 4.3501,4451 iii 4.3501,4451
D ----------- iv 15.0640,9651
E ----------- =================
F 7.1220,0016 59 -- must go to i
G 8.6618,0705 62 -- must go to ii
H ----------- 51 -- now a collission
=============
...
59 -- can't be reached if any of 42, 9, 51, 16, 5 are removed
62 -- can't be reached if any of 42, 9, 51, 16, 5 are removed
But, actually, if you just look at it you see that C is the exact same value as iii. Beginner’s luck I guess.
Round Six
In: Out:
A 8.4809,2042* i 6.5004,3459
B 6.5831,7609* ii 9.2833,7262
C ----------- iii 0
D ----------- iv 15.0640,9651
E ----------- =================
F 7.1220,0016* 459
G 8.6618,0705* 262
H ----------- 651
============= ---
721 = (459+262)
913 = (262+651)
459 110 = (459+651)
262
651
Unfortunately, my simple simple algorithm isn’t noticing that i+ii add up to F+G (which would lead it immediately to the right answer and full solution). Turns out that the idea of re-splitting the outputs helps a great deal – earlier [in Round Three] it tried to trick the algorithm into assigning 42 [A] to 62 [ii], which would have been a mistake.
Oh well. Maybe someone can improve it. An obvious improvement would be to convert all the numbers to binary, first (since the zero digit is so problematic). Probably, there already exist specialized cryptographic / combinatorial algorithms for fast cracking.
Obviously I got a little overconfident here, after three examples that were (by construction) trivial to break with a pen and paper, I bit off more than I could chew with Example 4. I think if it just had one fewer input, I could have done it easily, but then again that is the whole point of CashFusion, right? If there are many outputs, it quickly becomes impractical.
Anyway, some guidelines for good CashFusions:
In any event, I do think I have shown the limitations of the “stirling heuristic”. Often it does not apply at all.
When amounts differ substantially, the entire Fusion can be cracked open by hand with just a pen and paper and some arithmetic.
This paper seems to be a much much better direction, than just naively relying on the Stirling Heuristic.