The work presented here was originally developed on an FPGA in VHDL. As a starting point, we are going to describe the AES architecture and codes written in C that allows data to be encrypted and decrypted with your own cypher key.
The code was written using the Dev Cpp compiler.
The (AES) was the digital encryption standard designed to replace the older Digital Encryption Standard (DES). It is a Symmetric Key Cryptosystem, which means that it uses the same key for encrypting and decrypting. It is also known as a Block Cipher, as data is processed in blocks of 128 bits at a time. The algorithm behind the AES is known as the Rijndael Block Cipher (named after its developers Vincent Rijmen and Joan Daemen).
The AES algorithm is a symmetric cipher that can process 128 bits data block using cipher keys of length 128-bits, 192-bits or 256-bits. Each Data block can be considered as a 4×4 matrix or array of bytes called the state, on which operations are performed. Each element of the matrix is composed of one byte, therefore allowing byte by byte permutations efficiently. This also allows the implementation of AES on 8-bit platforms. This paper only concentrates on 128 bits long keys.
The AES algorithm consists of four basic operations; SubBytes, ShiftRows, MixColumns and AddRoundKey. The encryption and decryption procedures are shown below.
- AddRoundKey() is a XOR operation which adds the state to the round key. The initial round key (RoundKey(0)) is the key itself and subsequent round keys are generated by the key expansion. In decryption, the last round key (RoundKey(10)) is used first.
- SubBytes() is a non-linear transformation that substitutes each byte of the state with a byte from a substitution table (S-Box). InvSubBytes uses an inverse substitution table. The operation can be implemented by storing the S-box values in a Look-Up-Table or can be implemented by computing the substitution values.
- ShiftRows() (and InvShiftRows) operates on the state by performing a circular rotation on the state’s rows.
- MixColumn and InvMixColumn treat each column of the state as a 4-term polynomial over GF(28) and operates by multiplying each column of the state by a fixed matrix.
Figure 1 – AES Encryption & Decryption Algorithm
The encryption process starts off with the addition (XOR) of the plaintext to the initial key. Then a round consisting of the four transformation is performed in the order shown in Figure 1. The four transformations are performed iteratively for Nr-1 rounds on the state; where Nr is the number of rounds and is dependent on the length of the key. In the case of 128-bit long key, the number of rounds (Nr) is 10. In the last round, the MixColumn transformation is omitted.
The decryption process, on the other hand, is simply the inverse of the encryption process and the process starts with an addition of the ciphertext with the last key generated by the key Expansion, which is the 10th round key. Then the four transformations are performed on the State for 9 rounds and for the last round, the InvMixColumns operation is omitted. However, the order of the InvMixColumn and AddRoundKey is different from encryption. If the AES encryption and decryption is implemented based on Figure 1, this would lead to two different data path; one for encryption and one for decryption. So, it is desirable to have a similar structure for both procedures. Figure 2 shows an equivalent decryption procedure in which the same sequence of transformations is performed on the ciphertext and thus allows consistency with the encryption procedure.
Since InvMixColumn is a linear operation with respect to the 32-bit column input, the operation can be swapped with AddRoundKey provided that the InvMixColumn operation is performed on the Round keys for the round loop (Round= 1 to 9) as shown in Figure 2.
This similar structure for encryption and decryption adds high level of resource sharing, especially in the data path, which increases the efficiency and reduces the design complexity.
Figure 2 – Equivalent Decryption Algorithm
SubBytes and InvSubBytes are the most computationally complex operation in the AES algorithm. The coefficients of the S-Box and inverse S-Box can either be stored in LUTs or can be calculated on-the-fly using combinational logic. Using LUTs does not provide optimal area occupancy when compared to combinational logic. Combinational logic S-Box involves calculating the multiplicative inverse over GF(28) followed by an affine transformation. In inverse S-Box, an inverse affine transformation is done before the multiplicative inverse. An overall architecture of a joint S-Box and Inverse S-Box is shown in Figure 3.
Figure 3 – Combined SubBytes and InvSubBytes
The affine transform consist of a constant matrix multiplication followed by addition to a constant matrix. The combinational logic S-box as presented in this paper is capable of being pipelined which helps to overcome the limitation in clock frequency. A dedicated multiplicative inverse over GF(28) is, however, expensive in terms of area but the area can be significantly reduced by breaking down the GF(28) elements into smaller subfields; from GF(28) into GF(24)2 and then GF((22)2)2 . The elements are mapped into smaller subfields by using an isomorphic mapping and is mapped again into GF(28) with an inverse isomorphic mapping. Figure 4 below shows a multiplicative inverse circuit of the S-Box.
Figure 4 – Multiplicative Inversion Component for S-Box
The problem with the S-box design shown in Figure 4 is that it has a long critical path which will reduce the overall throughput of the S-Box significantly and thus also the throughput of the whole AES data path. The S-Box contains a large number of gates on it critical path and it is desirable to pipeline the S-box into several stages so that the critical path is reduced. Pipeline registers, however, introduces large hardware overhead. Pipelining is not usually desirable in compact or low AES implementation but however we still proposes a 5-stage pipelined S-box and examines its applicability in a low area AES design.
Figure 5 – S-box with 5-stage pipeline
The pipeline registers were placed so that the critical path is divided into 5 pieces with delays as close as possible. The proposed design was chosen as it provided an optimal compromise between slice count and maximum frequency. In this AES implementation, two S-boxes are used; one for SubBytes in round operation and the other is used by key expansion. In key expansion, only a Forward S-Box is used and so, the Inverse Affine and the 2 multiplexers were stripped for that design.
Comments and suggestions are most welcome.