This is going to be a short one, so this is more of a Q&A style – It implicitly assumes OpenSSH.
Why Custom SSH Moduli?
- Because it is fun to do
- You get way better probability guarantees for the primality if doing it right
- Custom moduli = spy agencies have to work a little harder
How?
The short version
ssh-keygen -G moduli.candidates -b <len> -M 127
ssh-keygen -T moduli -f moduli.candidates -M 127 -a <#tests>
-G
specifies to generate moduli candidates and puts the results into the given filename.
-b
is the length of the prime + 1 bit – so if you set it to 4096, you would get 4095 bit.
-M
increases the memory used to speed things up a bit – 127 (MiB) is the maximum, though.
-T
tells it to test the candidates and put the results into the given filename.
-f
specifies the file with the candidates.
-a
sets the number of tries for Miller-Rabin per candidate – 100 by default.
Anything said for the short version can be found in this stackoverflow thread and the manpage.
Another gotcha is that OpenSSH is sequentially filtering candidates for a given invocation, so if you want to spread things out a bit, you need to run it multiple times.
Do not use 100 (or do)
While it might be tempting to cut down on the number of tries, it is actually counter-productive.
While virtually every textbook on primality testing will tell you that the odds for Rabin-Miller to return a false positive after $k$ independent tests is $\leq 2^{-2k}$, one can do actually way better than that.
In the paper Average case error estimates for the strong probable prime test, by Damgård, Landrock, and Pomerance, the authors give better conditional bounds.
Notation
As we will quote from the paper, we’ll use $k$ as the size of the prime in bit, and $t$ as the number of tries. $p_{k,t}$ denotes probability that a composite number of the given length $k$ has survived $t$ rounds of Miller-Rabin.
NOTE: This, including the textbook bound, only applies to randomly generated numbers that you generated. If an adversary sends you a «random» number, then NO BOUND WILL HOLD as said adversary could always send a Carmichael Number.
The first bound
The first bound occurs in section 5, Theorem 3 of the aforementioned paper and states:
For $k, t$ with $k\geq 21,\, 3 \leq t \leq \frac{k}{9}$ we have $p_{k,t} < k^\frac{3}{2}\frac{2^t}{\sqrt{t}}4^{2-\sqrt{tk}}$
So this even applies to the default 100 tries as used in OpenSSH
The mighty bound
This one occurs in section 6, Theorem 6 of the paper and reads:
For $k, t$ with $k\geq 21$ and $t\geq \frac{k}{9}$ we have $p_{k,t} < \frac{7}{20}k2^{-5t} + \frac{1}{7}k^\frac{15}{4}2^{-\frac{k}{2}\,-2t} + 12k2^{-\frac{k}{4}\,-3t}$.
So, we need our number of tries to be at least 1/9th of the bitlength of the number we want to test. Even for shitty-security 1024-bit numbers, we would need 114 tries at least.
So what’s the difference between those 3 guarantees
Glad you’re asking! I wrote a simple Python 3 script that computes the bounds (please note that it’ll abort with value errors as most intermediate values are outside of the double-precision floating points that Python’s providing, so I also used SageMath to compute the other ones):
from gmpy2 import mpz, mpfr, log, sqrt, ceil
def ld(x):
return ceil(log(mpfr(x)) / log(2))
def tb(t):
'''the textbook bound'''
return pow(mpfr(2), -2*t)
def t3(k, t):
'''the theorem 3 bound'''
return pow(mpfr(k), 1.5) * (2 ** t) / sqrt(t) * pow(4, -sqrt(t*k))
def t6(k, t):
'''the 'mighty bound' '''
chunk_a = mpfr(7)/20 * k * (2 ** (-5 * t))
chunk_b = mpfr(1)/7 * pow(k, 15/4) * pow(2, -k/2 - 2*t)
chunk_c = mpfr(12) * k * pow(2, -k/4 - 3*t)
return chunk_a + chunk_b + chunk_c
for k in range(2048, 8192 + 1, 1024):
print(';'.join(str(x) for x in (k, -ld(tb(100)), -ld(t3(k, 100)), -ld(t6(k, ceil(k/9))))))
which after some processing yields the following table:
length (bit) | textbook (100 tests) | Theorem 3 (100 tests) | Theorem 6 (‘mighty bound’) |
---|---|---|---|
2048 | 200 | 791 | 1130 |
3072 | 200 | 994 | 1699 |
4096 | 200 | 1165 | 2269 |
5120 | 200 | 1315 | 2834 |
6144 | 200 | 1452 | 3403 |
7168 | 200 | 1577 | 3973 |
8192 | 200 | 1694 | 4543 |
Note that the values are $-log_2$, so a value of $x$ means $2^{-x}$ or $1:2^x$.