Cooley-Tukey
This is a well-known algorithm involving "butterfly" MACs and a pre-chosen radix value. Follows decimation-in-frequency FFT structure.
If we generally state our
With
Then we can "rearrange" this as follows:
Disclaimer
This page is just a general idea and basic overview of implementing the NTT, both in HW and in SW. If you need more details on the implementation(s) there is more detail in the various pages for the algorithms. If you need more information about the theory then there is of course more information found in the maths and theory, and pre-requisite maths pages.
- A Complete Beginner Guide to the Number Theoretic Transform (NTT)
- A note on the implementation of the Number Theoretic Transform
- An Extensive Study of Flexible Design Methods for the Number Theoretic Transform
- Github link for previous paper: Parametric NTT/INTT Hardware
- A Flexible NTT-Based Multiplier for Post-Quantum Cryptography.
- Kind of an alt-link to the previous one: Designing Efficient and Flexible NTT Accelerators
- Designing Efficient and Flexible NTT Accelerators, Ahmet MALAL
- Key paper required for previous one: Design of a Flexible Schönhage-Strassen FFT Polynomial Multiplier with High- Level Synthesis to Accelerate HE in the Cloud
- A Flexible and Scalable NTT Hardware : Applications from Homomorphically Encrypted Deep Learning to Post-Quantum Cryptography Alt link