guessadx 0.1
by hcs
http://here.is/halleyscomet

guessadx computes all possible encryption keys for an ADX file.
It does not actually do decryption, that is the function of another utility,
degod.
All keys can be computed in about 2.5 hours on a fast machine (tests were done 
on a 3.2GHz Pentium 4 with 1MB cache).
I've provided a Windows build via mingw, but it is built with gcc 3.4.5 which
has fewer optimizations than the latest gcc versions and thus is a bit slower.

* A background on ADX encryption

An ADX stream is composed of 18 byte frames, each containing a two byte header,
known as the scale. This is a single 16-bit big endian value which is multiplied
with every sample in the frame (actually the scale +1 is used). The remaining
16 bytes of the frame each represent two samples.

ADX supports a simple encryption, wherein the sequence of values from a linear
congruential generator (LCG) is XORed with the scale of each frame.
To decrypt we simply compute the output of the LCG and XOR it with the encrypted
scale to produce the original scale value. This is a basic property of XOR,
if S XOR R = C then C XOR R = S (S is the original scale, R is the random number
output from the LCG, and C is the encrypted scale), so the same operation is
used to encrypt and decrypt.
Another important property of XOR is that A XOR 0 = A, which will come up later.

An LCG is a very simple pseudo-random number generator, which produces a
series of seemingly random numbers. To compute the next value in the sequence
(rand_1) from the current value (rand_0) we use the following equation:

    rand_1 = (rand_0 * multiplier + increment) MOD modulus

Where multiplier, increment, and modulus are parameters. With those parameters
and an initial random value (which we'll call the start value) we can compute
the entire sequence of numbers.

In the case of ADX encryption, the modulus used in 32678 (8000 in hexadecimal).
The other values (start, multiplier, and increment) form the encryption key, if
we know those we can easily decrypt the ADX, and I have written a utility called
"degod" to do just that. The trick, then, is determining the key. As far as I
have seen every ADX file in a single game uses the same key, though there is no
reason why this should be universally true.

An additional wrinkle: The ADX encrypter does not perform encryption on a silent
frame, which is a frame consisting entirely of zeroes. This is because if the
samples are all zero, the scale will also be zero (this is a property of the ADX
encoder but it is not necessary, as we will see). Any value XORed with the zero
scale will be that same value (due to the earlier mentioned property of XOR),
which would allow precise recovery of a single value of the random stream;
this would allow the key to be determined much easier. So the scale value is
left at zero. When the player is decrypting the frame it will XOR the zero scale
with the output of the LCG, which will yield a random scale value, but this is
unimportant as the random scale will be multiplied by zero and thus the result
will always be zero.

While the scale is 16 bits, only 15 of those are actually usable; the high bit
is used as a signal of the end of the stream. The LCG that ADX encryption uss
produces random values filling the entire 15 bit range (0 to 32767). However,
valid scale values are only 13 bits (0 to 8191). This means that the three high
bits of a scale value are always zero. This also means that in an encrypted ADX,
two bits are equal to two bits of the LCG output, since they are XORed with
zero bits and therefore unchanged (the aforementioned property of XOR, again).

* How guessadx works

guessadx takes advantage of this property, that we know exactly two bits of the
LCG output. Using the two known bits of the start value, we guess each of the
8192 possible values for the whole start value. For each of these, we guess each
of the 32768 possible mutlipier values (actually we only check half of these, it
is assumed that the multiplier is always odd). For each of these, we guess the
8192 possible values of the second scale (two bits of this are known, just as
in the start value) and use those to compute an increment which would arrive at
the second value given the start value. We then compute the LCG values and check
them against the two bits of all the scales (actually only up to 32768, as the
LCG will start generating duplicate output after than many; also we stop as soon
as we hit a nonmatching value). So we end up checking 8192*32768/2*8192=
1099511627776 (a bit over 1 billion) keys. That's an awful lot, but compare it
to the 35 billion+ keys in the whole keyspace.

This method is a huge improvement over the old brute force method used in degod.
It is still a brute force method, but it is precise (generates all possible
keys) and much faster.

Another bit of detail is that the checker needs a string of encrypted values to
work with. If there are any silent frames, which are unencrypted, we won't be
able to check with them, but the LCG still computes a value for them so we
can't simply skip them. guessadx searches for a long series of consecutive
encrypted frames to work with. However, this means that the start value it
initially finds is not actually the start value for the whole file. To find that
we guess every possible start value (32768 of 'em), and check that they would
produce the values we see at the beginning of the file, culminating in the fake
start value that we were using earlier. As this only has to be done once for
every found key, and found keys are very rare, it doesn't slow down the overall
computation at all.

* guessadx output

The primary output of guessadx is on standard output. Whenever it finds a key
that works it outputs it in this format:

    -s 49e3 -m 4a57 -a 4091 (error 13793869)

The first three values are the start, multiplier, and increment of the key, in
hexadecimal. The -s, -m and -a are command line switches to degod. You can copy
that section verbatim onto a degod command line, like so:

    degod -s 49e3 -m 4a57 -a 4091 song.adx

The last bit is the total of the decrypted scale values in the range that was
checked; this can be used to judge between multiple keys (lower is better, but
very large values can still be correct).

guessadx outputs some status information on standard error so you can tell how
long it is going to take. This is formatted as follows:

   9e0  30%      121 minutes elapsed      272 minutes left (maybe)

The first value is the current guess for the low bits of start, in hexadecimal.
The second value is the percentage of the total computation that is complete.
The third value is the time since the guessing portion of the program began.
The fourth value is an estimate of how much time remains, assuming that the
program continues checking keys at the same rate it has been all along.

* Issues

I do not know for certain if the multiplier must always be odd, but that
assumption has served me well so far. If we don't discover any key for a file
this might be the first thing to check.

When multiple keys are generated, there is often no particular reason to choose
among them. There are simply certain LCGs for which there exist other LCGs that
differ only in the low bits. These bits are lost in the randomness inherent in
the low bits of the ADX scales. This both that it is never possible to be
certain which of the several keys is corrct (without additional information),
but also that any differences between the keys will be completely inaudible.
Ya win some, ya lose some.

I suspect that there may be a much faster way to compute LCG parameters given a
few bits of the output, but the math eludes me.

* Note

I'd like to note that this method could have been defeated if the LCG had only
13-bit random values. This would not have rendered the files as completely
unlistenable as the 15-bit randomness, but it would still have sounded like
noise. We would have been stuck with far less efficient keyfinding methods if
this was the case. However this would result in the encrypted audio being
recognizable as the original song, just very noisy, which would reveal this as
the lousy encryption it is.

-hcs
