Breaking Ascon-128 with Differential Fault Analysis

Background

Last month, NIST announced[1] that Ascon will be the standard for lightweight crypto. Ascon is a tiny and fast family of algorithms which can perform authenticated encryption and hashing. Although I had been loosely following the lightweight crypto contest, I had not actually looked into the details of any of the entries. Although Ascon has a lot of impressive features, one of the first things that stood out to me as a hardware hacker was that AEAD encryption seems very susceptible to differential fault analysis.

Coincidentally, my ChipWhisperer Husky[2] arrived only a few days after I first looked into Ascon. The ChipWhisperer Husky is a brand new tool from NewAE for performing fault injection and side channel analysis attacks, and it comes with a lot of fancy features (most of which won't be covered in this post). Given that Ascon was fresh on my mind, I decided to try to break it with DFA. This post will be about that attack.

Fault Injection

Fault injection, or glitching, is a type of active attack on hardware devices where the attacker modifies the operating conditions of a system to cause it to misbehave. By very precisely controlling these changes, specific effects can be induced in the target. In this post, I am using voltage fault injection against code running on a microcontroller. There are many other forms of fault injection (electromagnetic, laser, thermal, clock), and these techniques can work on both hardware (e.g. crypto accelerators which are not implemented with code running on a cpu) and software targets. Using the ChipWhisperer Husky, I can short the victim processors power supply to ground through a MOSFET for a controlled amount of time. When this happens, the internals of the CPU will misbehave. Although it is impossible to directly measure exactly what happens internally, some examples of common effects are:

Instruction skipping. This could be due to directly skipping an instructions execution, or it could be due to skipping the side effects (e.g. causing a bus transfer to fail, resulting in memory not getting updated after a store instruction)
Data corruption. This could be single- or multi-bit flips, reading/writing from the wrong address, or reading a word as all-zero/all-one
Instruction corruption. This is a result of data corruption when reading instructions from memory, but I still think it is interesting enough to point out
Control flow corruption. Causing branches to take the wrong path, corrupting the program counter directly, etc. Again, these effects may be caused by data corruption (eg load r1 -> load pc)

As a programmer, you can create some mitigations to help with some of these fault types. But as you can probably imagine, it is basically impossible to write a completely secure program in the presence of faults. High-security systems rely on a combination of hardware and software countermeasures to mitigate and detect fault attacks.

In order to perform fault injection attacks, we need:

A trigger, or in other words, some sort of timing reference so we know when to introduce a fault. In this post, I am using a GPIO signal which goes high just before the target operation. The ChipWhisperer supports many other trigger modes including UART triggering and waveform triggering (which allows you to trigger based on real-time analysis of side channel leakage)
A way to send/receive data. If you can't receive data, you can't tell whether an attack worked. In this case I'm using UART. In the real world, you might send data by loading a malicious firmware image that toggles a gpio when you defeat secure boot and run the image.
A way to classify results and determine if your fault worked. This could be as simple as setting a GPIO if the glitch was successful. In this post, I'm checking the response of the encryption against different values.
A way to inject faults. In this case, I'm connecting the Crowbar output of the Husky to J3 on the CW313 board, which is connected to the power line of the target microcontroller.

With voltage fault injection on the Husky, we can vary two parameters:

Glitch length: the number of cycles that the power line is shorted to ground
Glitch delay: the number of cycles that we wait between receiving the trigger and injecting the glitch

If the glitch length is too long, we will always crash the target. If it's too short, we will never affect the target. If it's just right, we should see a mix of crashes, corrupted outputs, and (hopefully) successful glitches as we vary the glitch delay.

The glitch delay doesn't give us as much feedback for tuning it, but at a minimum we can usually observe data corruptions, try to guess what operation we affected, and tune based on that.

My approach in this case was to manually tune the glitch length down to a small range, set wide parameters for glitch delay, and brute force the search space. This isn't always practical if you have a large search space or a very slow operation, but it was good enough for this PoC.

Differential Fault Analysis

Differential fault analysis (DFA) is an active attack against implementations of cryptographic algorithms. The attack is performed by introducing faults during intermediate steps of the algorithm, and observing how the output changes. If an attacker can reliably corrupt the right data, it may be possible to calculate the key based on differences between the faulted and non-faulted/normal outputs.

In the lightweight crypto algorithm requirements[3], NIST states:

While implementations will not be required to protect against fault attacks, the ability to provide it easily and at low cost will be taken into consideration

Although DFA was probably considered in the design of Ascon, Ascon was not required to be secure in the presence of fault attacks.

To explain the attack we will be performing, lets look at how Ascon works. Ascon encryption is performed in four steps[4]:

Initialization: initializes the state with the key K and nonce N
Associated Data Processing: updates the state with associated data blocks A_i
Plaintext Processing: injects plaintext blocks P_i into the state and extracts ciphertext blocks C_i
Finalization: injects the key K again and extracts the tag T for authentication

The final step is the one that stood out to me. Injecting the key means xoring the key with the state. Specifically, the tag is exactly calculated as K ^ {state[3],state[4]}, where state is a 320-bit value arranged as 5 64-bit words. This is a juicy target for DFA because if we can inject a fault that can skip the xor operation, the key can be calculated as K=tag_normal ^ tag_faulted. Simple enough :)

I googled around to find out if anyone had published anything about DFA on Ascon before. The only attacks I saw were either impractical[5] or statistical (e.g. SIFA)[6][7]. From my experience, statistical attacks like SIFA are much less useful to a real-world attacker than classical DFA since SIFA requires significant control over the results of the glitch. I expect more techniques and implementation-specific attacks to show up over the next few years, but for now this is all I was able to find. I haven't seen any mentions of the basic DFA attack described in this post. If I missed any other prior work, please let me know!

Update: I found another similar attack[11]. This is a more complicated attack which relies on ineffective faults and biasing the s-box. That said, the authors point out that the key addition in the finalization stage is a weakness, and they describe methods to distinguish faulty tags, which is helpful for this attack.

First Proof-of-Concept

I decided to use the SAM4S target board that came with my ChipWhisperer Husky. This is a 32-bit ARM Cortex M4. I hacked together a SimpleSerial application with the pure C Ascon reference implementation[9]. I also added a trigger just before the xor operations, since my initial goal was a quick and dirty proof of concept. Since this is a 32-bit implementation, I expect to require 4 unique glitches to recover the full 128-bit key. I kept all inputs constant and performed a quick brute force of parameters. As this is a characterization experiment, I classified results by xoring the known-good tag with the corrupted tag, and comparing the results to the known key.

I performed a scan of parameters, and was able to recover they key.

$ python3 swascon_dfa_constn.py 
Connected...
(('corrupt', 5, 100, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m<\x0e\x00 \xbf\x0c@\x000\x00\x00\x00\x10\x00\x00\x00'), b'\x06O\x01\xb1N\xecn\x83\xd9Ii\x84D\x13.\xc1')
(('corrupt', 5, 101, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m<\x0e\x00 \xbf\x0c@\x000\x00\x00\x00\x10\x00\x00\x00'), b'\x06O\x01\xb1N\xecn\x83\xd9Ii\x84D\x13.\xc1')
(('key02', 5, 103, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf5\xe5(\x84\xe9Ii\x84T\x13.\xc1'), b'\x00\x00\x00\x00\x04\x05\x06\x07\x00\x00\x00\x00\x00\x00\x00\x00')
(('key01', 5, 104, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:@\x03\x92\xf1\xe0.\x83\xe9Ii\x84T\x13.\xc1'), b'\x00\x01\x02\x03\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00')
(('corrupt', 5, 106, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf1\xe0.\x83\xe9Ii\x84b_!_'), b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x006L\x0f\x9e')
(('corrupt', 5, 107, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf1\xe0.\x83\xe9Ii\x84\x00\x00\x00\x00'), b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00T\x13.\xc1')
(('corrupt', 5, 108, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf1\xe0.\x83P\x17*\xc5T\x13.\xc1'), b'\x00\x00\x00\x00\x00\x00\x00\x00\xb9^CA\x00\x00\x00\x00')
(('corrupt', 5, 112, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf1\xe0.\x83\x00\x00\x00\x00T\x13.\xc1'), b'\x00\x00\x00\x00\x00\x00\x00\x00\xe9Ii\x84\x00\x00\x00\x00')
(('key04', 5, 117, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf1\xe0.\x83\xe9Ii\x84X\x1e \xce'), b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x0c\r\x0e\x0f')
(('key03', 5, 118, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xf1\xe0.\x83\xe1@c\x8fT\x13.\xc1'), b'\x00\x00\x00\x00\x00\x00\x00\x00\x08\t\n\x0b\x00\x00\x00\x00')
(('corrupt', 5, 120, b'\x00\xd5\xc0H\xca\xd3\xa5f\x81v\xae\x1a\xca:\x00m:A\x01\x91\xe9Ii\x84\xe9Ii\x84\x00\x00\x00\x00'), b'\x00\x00\x00\x00\x18\xa9G\x07\x00\x00\x00\x00T\x13.\xc1')

The output is a bit messy, but the important parts are the 2nd and 3rd fields. The glitch length was 5 cycles, and the delay for successful results was 103, 104, 117, or 118 cycles.

You can download the script here.

Double Glitch

That was a good proof of concept, but the Ascon V1.2 paper[10] notes that:

In order to fulfill the security claims[...], implementations must ensure that the nonce (public message number) is never repeated for two encryptions under the same key, and that decrypted plaintexts are only released after successful verification of the final tag

Thus, the implementaton where an attacker can control the nonce is already insecure. In fact, a practical adaptive chosen plaintext attack based on nonce misuse has already been published[8].

Although there is no specific recommendation for nonce generation, the specification notes that:

Except for the single-use requirement, there are no constraints on the choice of the nonce (public message number); in particular, it is possible to use a simple counter.

Based on this, I modified my implementation to use a global counter approach. A common approach in embedded devices is to avoid having to persistently store (and secure) the nonce across reboots by randomizing part of it, and using the other part as a counter. In my implementation, the first 64 bits will be randomized and the remaining 64 bits will be a counter which is initialized to 0 when the device boots. In a real device, you might see something like:

device_init(){
    ...
    uint64_t nonce[2];
    nonce[0]=64bit_trng();
    nonce[1]=0;
    ...
}

do_encryption(...){
    do_ascon_aead(nonce,...);
    nonce[1]++;
}

This is sufficient to prevent the cryptanalytic nonce-misuse attack in the real world.

In order to recover a single word of the key, we need the same nonce to be reused one time. So, our attack flow will look like:

1. Perform one encryption with a fault

2. glitch the counter increment operation to prevent it from updating (or, depending on the fault effects, fault it to a specific value such as 0)

3. Perform a second encryption with the same inputs

4. xor the tag values together to recover a word of the key

This requires two glitches, which means we have a lower chance of success per attempt. Luckily, the SAM4S was very easy to glitch, with a success rate of ~90% for the single-glitch attack. Based on that, we can probably expect a >80% success rate with the double-glitch attack.

I first tried to perform a single glitch to fault the increment so I could find the timing correct and... It didn't work. Instead of skipping the increment, I was reliably incrementing by 2. This makes me suspect that the actual fault model that I'm triggering here (and possibly in the previous test) is causing instructions to be re-executed (perhaps by corrupting the program counter update). Although this is a fault model that I have seen before, my experience is that it is usually rare and comes alongside many of the other more common fault types mentioned above.

I worked around this by refactoring my code slightly. By forcing the system to load an increment value from memory, then adding that to the nonce counter instead of performing a addition with an immediate value, I found that I could reliably skip the nonce increment. I'm most likely faulting the load operation rather than the increment itself, but at the end of the day the outcome is the same. I identified the required timing and was able to try the double-glitch attack.

After implementing the double-glitch, I was able to successfully extract the key very quickly and reliably.

$ python3 swascon_dfa.py 
Connected...
00010203000000000000000000000000
00010203040506070000000000000000
000102030405060708090a0b00000000
000102030405060708090a0b0c0d0e0f
Key recovered: 000102030405060708090a0b0c0d0e0f

You can download the double-glitch script here.

But what about decryption?

The tag calculation makes targeting encryption very easy and fairly obvious why it works. With decryption, its not as immediately clear that DFA is even possible. After staring at it, the cipher looks like it essentially constructs a keystream out of hash operations. I don't think we can invert the hash operations, and even if we could, we probably wouldn't easily be able to get the full key out without a lot of extra work. A minimum of 2 glitches would be required if the tag value is checked before returning the plaintext, but the best attack that I might be able to perform on a software implementation would require two triple-glitches.

One attack would look like:

The input should include fixed ciphertext, a fixed nonce, and no associated data.
Skip the call to p^a in initialization. This results in the initialized state being [IV,K0,K1,N0^K0,N0^K1]
Cause an addressing fault such that instead of computing pt_0=S_0^c_0, the system performs pt_0=S_{1,2,3,4}^c_0
Fault tag validation so the plaintext is released to the attacker.

This attack is probably not applicable to hardware implementations, and probably not feasible in most real-world software implementations. That said, I will probably attempt something like this attack at some point since the SAM4S was so susceptible to double-glitches.

If anyone has more ideas for performing this attack with fewer glitches, I'd love to hear them. A practical DFA attack on decryption would probably be more interesting than this encryption attack, since I find attacks on decryption (eg firmware loading) to be more common in the real world.

Update: Single-Fault Method

This attack assumes the following:

We can encrypt and decrypt arbitrary data with the same (unknown) key
Nonce generation is not attacker-controlled
Decryption will only release the plaintext if the tag is correct, otherwise it will return an error
We have characterized the target sufficiently to specify which word we are trying to fault

Under these assumptions, the attack works as follows:

Send a plaintext, and fault the key addition as before
Use decryption as an oracle to validate that the tag was faulted
Brute force the specific word of the tag. If the implementation releases the tag, you have the correct value. If no correct value identified, go back to the start
xor the faulted tag word with the correct tag word, and you will have recovered a key word

This attack replaces the fault on the nonce increment with a brute force step.

The total complexity of the attack is 128/wordsize faults and (128/wordsize) * 2^wordsize decryptions. On my target, the 4*2^32 brute force will be pretty impractical since the serial interface is 38400 baud. Other 32-bit implementations might make this attack practical. and 8- or 16-bit implementations should be very attackable with this method. 64-bit implementations probably won't be vulnerable to this method in practice.

This attack was partially inspired by [11], which mentioned using decryption as an oracle for correct tags.