Since the end of the 1990s, cryptosystems implemented on smart cards have had to deal with two main categories of attacks: side-channel attacks and fault injection attacks. Countermeasures have been developed and validated against these two types of attacks, taking into account a well-defined attacker model. This work focuses on small vulnerabilities and countermeasures related to the Elliptic Curve Digital Signature Algorithm (ECDSA) algorithm. The work done in this paper focuses on protecting the ECDSA algorithm against fault-injection attacks. More precisely, we are interested in the countermeasures of scalar multiplication in the body of the elliptic curves to protect against attacks concerning only a few bits of secret may be sufficient to recover the private key. ECDSA can be implemented in different ways, in software or via dedicated hardware or a mix of both. Many different architectures are therefore possible to implement an ECDSA-based system. For this reason, this work focuses mainly on the hardware implementation of the digital signature ECDSA. In addition, the proposed ECDSA architecture with and without fault detection for the scalar multiplication have been implemented on Xilinx field programmable gate arrays (FPGA) platform (Virtex-5). Our implementation results have been compared and discussed. Our area, frequency, area overhead and frequency degradation have been compared and it is shown that the proposed architecture of ECDSA with fault detection for the scalar multiplication allows a trade-off between the hardware overhead and the security of the ECDSA.

In our daily life and with technological progress, the development of applications, especially those deployed in client-server mode over the Internet (such as e-mail, access to Web servers, access to databases, e-commerce, etc.) is rapidly growing. These applications require security by the use of symmetric or asymmetric cryptographic systems. Asymmetric cryptographic systems offer the best advantages in terms of security and easy implementation. On the other hand, used alone, encryption systems do not ensure authenticity, which is equal to Authentication + Integrity. In fact, the main security technique implemented so far is the digital signature. It makes it possible to ensure the authenticity and integrity properties of the exchange. It is based on asymmetric cryptography and hash functions. The digital signature is an encrypted fingerprint added to the message. It ensures integrity, authenticity and non-repudiation by the sender. It is based on asymmetric cryptography and hash functions.

ECDSA is a variant of digital signature algorithm (DSA) that uses cryptography on elliptic curves. It is the most standardized algorithm of signature schemes based on elliptic curves. It is important to remember that a cryptographic signature generated with an ECDSA algorithm having the size 64 bits has the same security level as an Rivest Shamir Adleman (RSA) cryptographic signature of size 2048 bits. In recent years, research on ECDSA hardware implementation has become very popular [

Information security heavily relies on integrated circuits (ICs). Unfortunately, ICs face a lot of threats such as side channel or fault attacks. This work focuses on small vulnerabilities and countermeasures for the ECDSA. The motivation is that leakage sources may be used in different attack scenarios. By fixing the leakage, existing attacks are prevented but also undiscovered or non-disclosed attacks based on the leakage. Moreover, while the elliptic curve scalar algorithm is at the heart of the security of all elliptic curve related cryptographic schemes, all the ECDSA system needs security. A small leakage of few secret bits may conduct to fully disclose the private key and thus should be avoided. The ECDSA can be implemented in different flavors such as in a software that runs on a microcontroller or as a hardware self-contained block or also as a mix between software and hardware accelerator. Thus, a wide range of architectures is possible to implement an ECDSA system. In spite of their security, the hardware implementation of the ECDSA signature is subject to fault injection attacks aimed at recovering the secret key [

This paper is organized as follows. The second section gives a brief overview of some of the most relevant work in the literature. Section 3 is devoted to the proposed hardware architecture of ECDSA and discussion of the results of the proposed architecture synthesis. In the fourth section, a countermeasure at a higher hierarchical level of cryptography on elliptic curves, which is scalar multiplication, to protect ECDSA against fault attacks, is presented. Our conclusions are drawn in the final section.

This section focuses on the methods of countermeasures proposed for the ECDSA algorithm against fault injection attacks in the literature. Fault injection attacks are known to be very effective and easy to implement and consist of injecting one or more faults during the execution of the cryptographic process and using the wrong outputs to obtain information on the secret key. They are very effective against insecure implementations. Therefore, the need to protect implementations of the ECDSA against fault injection attacks. However, studies on the resistance of hardware implementation of ECDSA to faults injection attacks are lacking. One of the first examples of error analysis and detection procedures for signature is presented in [

The digital signature is one of the main advantages of asymmetric cryptography. It is a mechanism to authenticate a message, that is, to prove that a message is from a particular sender. The signature protocol ECDSA, proposed in 1992 by Serge Vaudenay, is a variant of the DSA standard which, unlike the original algorithm, uses cryptography on elliptic curves. It is now the most standardized signature scheme, appearing in the American national standards institute (ANSI) X9.62, federal information processing standards (FIPS) 186-3, institute of electrical and electronics engineers (IEEE) 1363-2000 and International organization for standardization (ISO)/international electrotechnical commission (IEC) 15946-2 [

In this section, we introduce how to calculate the signature using ECC-based crypto-systems. For this, three Algorithms 1–3 are needed: private and public key pair generation, signature generation and signature verification. The ECDSA system is based like other algorithms processed on a public and a private key. To create its two keys, we follow algorithm 1.

The private and public key pair generation algorithm uses the random number generation (RNG) for random generation of the private key. Subsequently, the public key is computed by the scalar multiplication of the private key by the generating point of the elliptic curve E.

Having both keys (private and public), the algorithm 2 allows the generation of the signature of a message m by using a chosen hash function named H.

The verification step made by the algorithm 3 is needed to verify the signature (r, s).

An architectural description of the ECDSA signature authentication cryptosystem has been developed (see

This architecture is composed of five modules:

A 163-bit ECC processor: based on the method of Montgomery, for the calculation on the scalar multiplication operation.

A hash block: Generates the hash used in both the electronic signature generation and verification operation of the message.

A Random Number Generator: generates a sequence of random numbers used as keys during the signature process.

An intermediate register to maintain the hash.

Control unit: Generates the control signals of the function blocks as well as the clock signals.

The key represents the first parameter that must be well chosen to have a secure system. So having a key that is easy to implement on hardware platforms but at the same time difficult to attack is the first concern. It is therefore necessary to make the right choice of the algorithm to generate this key. There are several hardware methods for generating cryptographic keys, the best known are linear feedback back shift registers (LFSRs). These LFSRs can be used for other stream encryption generators such as those used for the Global system for mobile communications (GSM) standard (A5/1, A5/2 and W7). There are also generators based on non-linear feedback shift register (NLFSR), known as Grain [

Encryption algorithms | The length of the key (bit) | Initialization vector (bits) |
---|---|---|

Rivest Cipher 4 (RC4) | 8-2048 | 8 |

MUGI | 128 | 128 |

A5/1 | 64 | 114 |

W7 | 128 | 128 |

CA 16 × 16 | 256 | 16 |

Grain V1 | 80 | 64 |

Grain 128 | 128 | 96 |

Being a synchronous stream encryption primitive, the Grain can be better implemented in hardware. It is bit-oriented while RC4 provides byte-by-byte encryption (used in secure sockets layer (SSL) and 802.11 wireless fidelity (WiFi)) but has weaknesses in initialization. On the other hand, we find the A5/1 (used in GSM) which is basically good but easily breakable. A characteristic that makes the Grain very powerful is the possibility of increasing the speed by means of hardware, which makes it very fast. It is well suited for real time applications. For this reason, we adopt this method in our implementation.

Grain 128 [

The grain-128 synthesis results are presented in

Area (Slices) | Area (Luts) | Frequency (MHz) | Throughput (Mbps) | Efficiency (Mbps/slice) | |
---|---|---|---|---|---|

400 | 404 | 259.271 | 259.271 | 0.64 |

According to the implementation results, the grain generator is the ideal trade between area and frequency. Moreover, the grain-128 preserves the benefits of the grain-80 by supporting a 128-bit key and a 96-bit initialization vector.

One of the most important applications of hash functions is their use in electronic signatures. The electronic signature is a mechanism analogous to the handwritten signature to guarantee the integrity of an electronic document and proving to the reader of the document the identity of its author. Such mechanisms utilize the methods of asymmetric cryptography. There is a variant algorithm that performs the hashing,

Parameters Algorithms | Message size (bits) | Condensed size (bits) | Number of rounds |
---|---|---|---|

MD5 [ |
<2^{64} |
128 | 80 |

SHA-0 [ |
<2^{64} |
160 | 80 |

SHA-1 [ |
<2^{64} |
160 | 80 |

SHA-256 [ |
<2^{64} |
256 | 64 |

SHA-384 [ |
<2^{128} |
384 | 80 |

SHA-512 [ |
<2^{128} |
512 | 80 |

SHA_3(Keccak) [ |
<2^{128} |
256 | 24 |

These functions process a variable length input to output a fixed length message.

Another construction called sponge construction has been proposed by Bertoni et al. in 2008 [

The footprint of a message m is calculated as follows: We start by adding a filling sequence (injective padding), then we divide the resulting string into blocks m_{1}… m_{k} of size t. Then, the b bits of the internal state are initialized with the value ‘0’. The procedure takes place in two successive steps (see

The absorption step: During this phase, the blocks of the message are processed (“absorbed”) iteratively. The first block m_{1} is Xorated (exclusive OR) with the internal state. The result is then subjected to a transformation f. Then, the second block m_{2} is Xored with the state and the transformation f is applied again. The same operations are repeated until all message blocks are processed.

The pressing step: During this phase, values of the internal state are generated at different times by calls of the transformation f. The size of the prints extracted, can be chosen according to the needs.

The permutation Keccak-f applies to a state of 1600 bits, which is represented by a three-dimensional binary matrix, a [x] [y] [z], with 0 ≤ x, y < 5 and 0 ≤ z < 64. As a result, the state can be viewed as 64 slices, each of which contains 5 rows and 5 columns. The initial number of laps for the Keccak-f swap was set at 18 laps, but increased to 24 for the second round of the SHA-3 competition. This change is due to the publication of the zero-sum distinction by Aumasson and Meier [

On the FPGA Virtex5 XC5VFX70T-FF665-(-3), the KECCAK IP was introduced. In terms of frequency, Area, Throughput (Gbps) and Efficiency (Mbps/slice),

Area (Slices) | Clock cycles | Frequency (MHz) | Throughput (Gbps) | Efficiency (Mbps/slice) | |
---|---|---|---|---|---|

1309 | 25 | 343.124 | 14.054 | 10.73 |

With a performance of 256 bits each 25 clock cycles and a throughput of 14.054 Gbps, this design achieves a maximum operating frequency of 343.124 MHz.

The main operation in any protocol using elliptic curves is the multiplication of a point on the curve by a scalar; the calculation of kP = P + … + P, where k = (k_{i}k_{i-1}…k_{1}k_{0}) is an integer and P a point on the curve. In order to carry out this operation effectively, many methods have been proposed. They are generally based on the so-called “double-and add” algorithm consisting in performing series of doubling interspersed with additions, according to the binary representation of the scalar K. To defend against simple power analysis (SPA) attacks, we will try to break the relation between current consumption and the branch side chosen in the algorithms and scalar multiplication. If the main bit k_{i} is 1 or 0, there are many ways to standardize the calculations performed. The Montgomery scale allows scalar multiplication to be performed by performing an exact addition of points and doubling for each bit of k. Using this algorithm, one can expect the current consumption to vary only slightly depending on whether the bit ki is 1 or 0. The Montgomery algorithm is one of the effective parries against side channel attacks (SCA). Its original version was introduced by Montgomery, and applied only to a special category of elliptical curves (Montgomery curves). Recently, this method has also become applicable to elliptic curves defined on GF (2^{m}) thanks to the addition and doubling formulas. The computation of kP is carried out by expressing the key k in the binary form: k = k_{i-1} k_{i-2} … k_{1} k_{0} and applying a chain of elementary doubling and addition operations. To obtain an efficient scalar multiplication, it is therefore necessary that the elementary operations be efficient and less expensive, such as the doubling and addition methods that have been used (addition and doubling of Montgomery). So the most efficient algorithm of scalar multiplication is the algorithm represented in projective coordinates which is given by the algorithm 4.

The architecture of an ECC cryptosystem is elucidated in _{1}, Z_{1}, X_{2}, Z_{2}. The first converter outputs will be the inputs of the calculation unit, and they must undergo a succession of doubling and point addition operations according to the Ki test bit value. Finally, the outputs X_{1}, Z_{1}, X_{2}, Z_{2} of the calculation unit must undergo a transformation at the second converter (projective-refinery) level to obtain the outputs (x, y) in affine coordinates. The three blocks of this architecture are synchronized by the synchronization signals from the controller.

In the proposed architecture, we have optimized the architecture of the ECC, it includes only two reusable Montgomery multipliers. This reuse of the multipliers increases the performance of the IP of the proposed architecture in this way from the point of view of slice occupancy and frequency.

ECC point multiplication | FPGA platform | Area | Frequency (MHz) | |
---|---|---|---|---|

LUTs | Occupied slices | |||

Proposed | Virtex-5 XC5VFX70T | 11261 | 3865 | 218.257 |

[ |
XCV2600E | - | 17630 | 46.5 |

[ |
Virtex-4 XC4VLX200 | 33414 | 17929 | 250 |

[ |
Virtex-4 XC4VLX200 | 30895 | 16544 | 256 |

[ |
Virtex-4 XC4VLX200 | 26557 | 14203 | 263 |

[ |
Virtex-4 XC4VLX200 | - | 10417 | 121 |

[ |
Virtex-5 XC5VLX110 | 29095 | 10362 | 153 |

[ |
Virtex-5 XC5VLX110 | - | 5768 | 343.3 |

An FPGA Virtex 5 was used to test the proposed ECC-IP. The suggested FPGA implementation of ECC Point Multiplication is compared to numerous recent works in the literature in terms of area and frequency in

The synthesis was carried out using the integrated synthesis environment (ISE) 14.7 tool. The Xilinx Virtex5 XC5VFX70T-FF665-(-3) hardware target. The architecture synthesis results are presented in the

Occupied slices | LUTs | Frequency (MHz) | Number of cycles | Execution time | |
---|---|---|---|---|---|

7748 | 17355 | 110.521MHz | 84885 | 1,6977 ms |

From the synthesis results, our proposed implementation of ECDSA can achieve an operating frequency of 110.521 MHZ and takes 7748 slices.

A comparative study was made between our results and the state of the art in the

Design | FPGA platform | Field (bits) | Area | Frequency (MHz) |
---|---|---|---|---|

[ |
Virtex-5 | GF(2^{163}) |
20628 | 148.963 |

[ |
Virtex-5 | GF(2^{163}) |
18504 | 107.4 |

[ |
Virtex-5 | GF(2^{163}) |
16387 | 13.156 |

[ |
Virtex-5 | GF(2^{163}) |
23760 | 195.309 |

[ |
Virtex-6 | GF(2^{163}) |
18740 | 100 |

Virtex-5 | GF(2^{163}) |
17355 | 110.521 |

The implementation proposed in [

When calculating an ECDSA signature, scalar multiplication kP is the most expensive step. The time required to calculate a signature, signature verification or shared secret exchange is mainly consumed by this operation. This operation is critical for the performance of the cryptographic system but also for the security of the secret key. The scalar used in an ECDSA signature is a number d chosen at random but if it is discovered by an attacker who also knows the signature (

The ephemeral key has been the source of many publications to find the secret key. The most powerful attack in terms of complexity and the amount of signatures required is proposed by Schmit et al. [

To protect against these attacks, we used a countermeasure at a higher hierarchical level of cryptography on elliptic curves, which is scalar multiplication and can therefore be used on any elliptic curve that we proposed in our paper [

Our design for hardware implementation of ECDSA digital signature with and without fault detection method for Montgomery scalar multiplication, was simulated, synthesized, and implemented using XC5VFX70T board from Xilinx Virtex-5 family. In addition, the developed method has been examined practically as shown in

The architecture of the ECDSA signature with and without fault detection method for the Montgomery scalar multiplication has been described using VHDL language, simulated by ModelSim simulator 6.4b and synthesized with XILINX ISE 14.7. The FPGA platform target was XC5VFX70T from Xilinx Virtex-5 family.

ECDSA signature | Area | Frequency (MHz) (degradation) | |
---|---|---|---|

LUTs (overhead) | Occupied slices (overhead) | ||

17355 | 7748 | 110.521 | |

21088 (21.51%) | 9921 (21.9%) | 108.4 (1.92%) |

The original ECDSA implementation necessitates 9921 occupied Slices for a frequency of 110.521 MHz. The ECDSA implementation with fault detection method need a 21.9% overhead of the Occupied Slices and 1.92% degradation of the frequency compared to ECDSA implementation against fault without fault detection method.

ECDSA’s suggested design is resistant to fault attack. The employment of a duplication scheme and scrambling approach for the Montgomery Scalar Multiplication algorithm as a countermeasure against fault attacks at a higher hierarchical level of cryptography on elliptic curves. Due to variances in FPGA technology employed and a lack of FPGA implementation results from other papers, a direct comparison of performance of hardware implementations with other reported results is not available. Therefore, it is difficult to give a fair comparison in this regard, but, in any case, our results clearly show that FPGA-based implementations of the ECDSA algorithm with and without fault detection methods are very attractive for many applications.

In this paper, we have presented a hardware implementation of the digital signature ECDSA. It is based on three steps: key generation, signature generation and signature verification. It requires the efficient implementation of three IPs for random key generation, asymmetric encryption and hash. we have chosen hash IP and random key generation IP, as well as asymmetric ECC encryption IP, which are the most efficient for all three IPs in terms of area, frequency and cost each time. However, other aspects can be studied: resistance to attacks, etc. For this reason, we used a countermeasure at a higher hierarchical level of cryptography on elliptic curves, which is scalar multiplication to protect against fault attacks. In addition, the proposed ECDSA architecture with and without fault detection for the scalar multiplication have been implemented on Xilinx FPGA platform (Virtex-5). Its frequency and area have been compared and discussed. The FPGA implementation results show that the proposed architecture achieves good performance in terms of frequency and area.

The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Research Group Project.

^{m})

^{163})

^{m})