

# Multi-frequency Data Parallel Spin Wave Logic Gates

Mahmoud, Abdulqader Nael; Vanderveken, Frederic ; Adelmann, Christoph; Ciubotaru, Florin; Hamdioui, Said: Cotofana, Sorin

DOI [10.1109/TMAG.2021.3062022](https://doi.org/10.1109/TMAG.2021.3062022)

Publication date 2021

Document Version Accepted author manuscript

Published in IEEE Transactions on Magnetics

### Citation (APA)

Mahmoud, A. N., Vanderveken, F., Adelmann, C., Ciubotaru, F., Hamdioui, S., & Cotofana, S. (2021). Multifrequency Data Parallel Spin Wave Logic Gates. IEEE Transactions on Magnetics, 57(5), Article 3401012. <https://doi.org/10.1109/TMAG.2021.3062022>

#### Important note

To cite this publication, please use the final published version (if applicable). Please check the document version above.

#### **Copyright**

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

#### Takedown policy

Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim.

## Multi-frequency Data Parallel Spin Wave Logic Gates

Abdulgader Mahmoud,  $1, a$ ) Frederic Vanderveken,  $2, 3$  Christoph Adelmann, 3 Florin Ciubotaru,<sup>3</sup> Said Hamdioui,<sup>1</sup> and Sorin Cotofana<sup>1, b)</sup> <sup>1)</sup>Delft University of Technology, Department of Quantum and Computer Engineering, 2628 CD Delft, The Netherlands <sup>2)</sup>KU Leuven, Department of Materials, SIEM, 3001 Leuven, Belgium 3)Imec, 3001 Leuven, Belgium

By their very nature, Spin Waves (SWs) with different frequencies can propagate through the same waveguide, while mostly interfering with their own species. Therefore, more SW encoded data sets can coexist, propagate, and interact in parallel, which opens the road towards hardware replication free parallel data processing. In this paper, we take advantage of these features and propose a novel data parallel spin wave based computing approach. To explain and validate the proposed concept, byte-wide 2-input XOR and 3-input Majority gates are implemented and validated by means of Object Oriented MicroMagnetic Framework (OOMMF) simulations. Furthermore, we introduce an optimization algorithm meant to minimize the area overhead associated with multifrequency operation and demonstrate that it diminishes the byte-wide gate area by 30% and 41% for XOR and Majority implementations, respectively. To get inside on the practical implications of our proposal we compare the byte-wide gates with conventional functionally equivalent scalar SW gate based implementations in terms of area, delay, and power consumption. Our results indicate that the area optimized 8-bit 2-input XOR and 3-input Majority gates require 4.47x and 4.16x less area, respectively, at the expense of 5% and 7% delay increase, respectively, without inducing any power consumption overhead. Finally, we discuss factors that are limiting the currently achievable parallelism to 8 for phase based gate output detection and demonstrate by means of OOMMF simulations that this can be increased 16 for threshold based detection based gates.

a)Electronic mail: a.n.n.mahmoud@tudelft.nl

b)Electronic mail: S.D.Cotofana@tudelft.nl

### I. INTRODUCTION

The amount of raw data has rapidly increased in the last few decades due to the information technology unprecedented growth. These data are usually processed on high efficiency CMOS technology based computing platforms<sup>1-3</sup> and as the amount of raw data increased, technology feature size has been shrunken to keep up with the computation power demands. However, when entering into the deca-nanometer regime CMOS downscaling becomes more difficult due to: (i) leakage wall<sup>4,5</sup>, (ii) reliability wall<sup>6</sup>, and (iii) cost wall<sup>4,6</sup>, which suggests the near end of Moore's law. As a result, different technologies, e.g., graphene<sup> $7-11$ </sup>, memristor<sup>12–16</sup>, spintronics<sup>17–21</sup> have been explored in an attempt to meet the exponentially increasing computing market demands<sup>22</sup>.

While each of these alternative technologies exhibits both strong and weak points, spintronics on its Spin Wave (SW) flavour seems to have a great potential to meet market needs<sup>22–27</sup> due to its: (i) Ultra-low power consumption as no charge movements are required in order to perform calculations, (ii) acceptable delay, (iii) down to nm range scalability, and (iv) natural support for data parallelism enabled by the fact that SWs of different frequency can coexist and selectively interact within the same waveguide.

In view of this, different logic gates built on spin wave technology were presented, e.g.,  $27-47$ , and in the sequel we briefly present some of them. A current controlled Mach-Zender interferometer based NOT gate has been the first experimentally demonstrated SW logic gate<sup>28</sup> and by making use of a similar method, other logic gates including XNOR, NAND, and NOR were realized<sup>29-31</sup>. NOT, OR, and AND gates were designed using three terminal devices with transmission lines<sup>32333435</sup> and voltage-controlled XNOR and NAND gates utilizing re-configurable nano-channel magnonic devices were suggested<sup>36</sup>. In addition, an XOR gate was proposed by embedding magnon transistors between the Mach-Zehnder interferometer arms<sup>37</sup>. By relying on another information encoding method, i.e., on SW phase rather than on SW amplitude as it is the case for the previously mentioned schemes, buffer, NOT,  $(N)$ AND,  $(N)$ OR, XOR, and Majority gates were introduced in<sup>38</sup>. Moreover, alternative Majority gate designs were suggested to decrease the SW back propagation and increase the SW transmission efficiency<sup>39–41</sup>. OR and NOR gates were designed using cross structures<sup>42</sup> and physically implemented Majority gates were reported in  $43-46$ .

All the previously mentioned designs operate on same frequency SWs, i.e., on 1-bit inputs,

therefore, if multiple-bit input functions are to be evaluated, e.g., bitwise XOR over two *n*-bit inputs  $A = (a_1, a_2, \ldots, a_n)$  and  $B = (b_1, b_2, \ldots, b_n)$ , an XOR gate structure must be replicated n times in order to process the  $n$  input bit-pairs (sets) in parallel at the expense of area overhead. However, different frequency SWs can simultaneously propagate through the same waveguide without affecting each other, while only interfering with their own species. This suggests that if each input pair  $(a_i, b_i)$  is encoded with  $f_i$  frequency SWs,  $XOR(A, B)$ can be potentially evaluated with one instead of  $n$  XOR gates. This approach has been pursued in<sup>47</sup>, which introduces a Majority gate structure able to simultaneously process 3 data set encoded at 3 different frequencies. However, the suggested structure contains a magnonic crystal that induces a large delay overhead.

In this paper we revisit the SW parallelism concept and propose a novel multi-frequency data parallel in-line generic SW gate structure. Our contributions can be summarized as follows:

- Generic multi-frequency data parallel in-line SW gate structure and an associated area optimization algorithm.
- Design and validation of 8-bit data parallel in-line Spin Wave logic gates: 8-bit 3-input Majority and 2-input XOR gates are instantiated and validated by means of Object Oriented MicroMagnetic Framework (OOMMF) simulations.
- Performance assessment and comparison with SW state-of-the-art: The proposed 8-bit 3-input Majority and 2-input XOR gates require 4.47x and 4.16x less area, respectively, when compared with functionally equivalent scalar SW gate based implementations, at the expense of 5% and 7% delay penalty, respectively, and no power consumption overhead.
- Parallelism limit study: Demonstrate by means of OOMMF simulation that the maximum currently achievable parallelism, i.e., the number of different SW frequencies, is 8 for phase based output detection and 16 when spin wave magnetization is utilized to detect the gate output.
- Design and OOMMF validation of a 16-bit data parallel in-line Spin Wave 2-input XOR gate.

The reminder of the paper is organized as follows. Section II briefly explains the SW physics fundamentals and the associated computing paradigm. Section III describes the proposed n-bit data parallel SW logic gate and introduces the associated area optimization algorithm. Section IV provides inside on the utilized simulation parameters, and presents simulation experiments related to the validation of the 8-bit 3-input Majority and 2-input XOR gates. Section V presents evaluation results for the two byte wide parallel gates and a comparison with functional equivalent scalar implementations. In addition, it discusses fan-in and geometric scalability, and maximum achievable parallelism issues, and variability and thermal noise effects. Section VI concludes the paper.

#### II. SW BASED COMPUTING BACKGROUND

When a ferromagnetic material is exposed to an external magnetic field electron spins arrange themselves in the applied magnetic field direction, in order to bring the total system energy to the lowest possible level<sup>48</sup>. Further, if the electron spins are deflected by an excitation method, e.g., by means of Magnetoelectric (ME) cell, antenna, a Spin Wave (SW) is created mainly due to exchange and dipole spin interactions. The precessional electron spin movement<sup>48</sup>, can be described by the Landau-Lifshitz-Gilbert (LLG) relation<sup>49,50</sup> as follows:

$$
\frac{d\vec{m}}{dt} = -|\gamma|\mu_0 \left(\vec{m} \times \vec{H}_{eff}\right) + \alpha \left(\vec{m} \times \frac{d\vec{m}}{dt}\right),\tag{1}
$$

where  $\gamma$  is the gyromagnetic ratio,  $\mu_0$  the vacuum permeability,  $\alpha$  the damping factor, m the magnetization, and  $H_{eff}$  the effective field and it is expressed as:

$$
H_{eff} = H_{ext} + H_{ex} + H_{demag} + H_{ani},\tag{2}
$$

where  $H_{ext}$  is the external field,  $H_{ex}$  the exchange field,  $H_{demag}$  the demagnetizing field, and  $H_{ani}$  the magneto-crystalline anisotropy.

An excited SW is characterised by its wavelength  $\lambda$  (the shortest distance between similar consecutive spins), wave number  $k(k = \frac{2 * \pi}{\lambda})$  $\frac{\pi\pi}{\lambda}$ ), frequency f (determined by the complete spin precession time), phase  $\phi$ , and amplitude A, as graphically indicated in Figure 1. As such, an SW can carry information encoded in its amplitude, phase, frequency, or a combination of them. Once formed, the SW propagates through the ferromagnetic material (waveguide) and may eventually meet other SWs present in the waveguide, case in which their interaction



FIG. 1. SW Parameters



FIG. 2. Wave Interference.

is governed by the wave interference principles. For instance, if two SWs with the same amplitude, wavelength, and frequency coexist in a waveguide, they interfere constructively if they have the same phase, and destructively if they are out of phase  $(\Delta \phi = \pi)$  as depicted in Figure  $2^{23}$ . Furthermore, if more than two waves having the same A, f, and  $\lambda$  interfere in the waveguide, the outcome captures a majority decision, i.e., if the number of spin waves having  $\phi = 0$  is larger than the number of spin waves having  $\phi = \pi$ , the resulting spin wave has  $\phi = 0$ , and  $\phi = \pi$  otherwise. Thus, SW interference provides natural support for direct Majority gate implementations, e.g., 3-input Majority is evaluated by means of a 3-SW interference in a waveguide<sup>23,38</sup>, while its CMOS based implementation requires  $18$ transistors. Moreover, SWs with different frequencies can coexist and propagate in the same waveguide without affecting each other and only interacting with other same-frequency SWs, which indicates that SW interaction provides intrinsic support for data parallel computing. Note that, in the most general case, spin waves with different amplitude, frequency, and wavelength can coexist and selectively interfere in the same waveguide, which results in more complex interference patterns as presented in Figure 3. As depicted in the Figure,  $F_1$  Waves 1 and 2 interference results in Wave 5 and  $F_2$  waves 3 and 4 interference results



FIG. 3. Different Frequency, Wavelength, and Amplitude Spin Wave Interference.

in Wave 6, while no interaction between the  $F_1$  and  $F_2$  waves occurs. We note that in our investigation we consider that regardless of their frequency all input SWs have the same amplitude.

Depending on the orientation relation between spin wave propagation, effective magnetic field, and magnetization three main Magnetostatic Spin Wave (MSW) types exist: Magnetostatic Surface Spin Wave (MSW), Forward Volume Magnetostatic Spin Wave (FVW), and Backward Volume Magnetostatic Spin Wave (BVW)23,48. While each type has certain interesting properties, FVWs are the most attractive as in-plane spin-wave propagation is isotropic, which is beneficial from the circuit design prospective.

Figure 4 depicts the generic structure of a SW based logic gate, which consists of multiple inputs  $(I_1, I_2, I_3, ..., I_n)$ , a Functional Region (FR), which might perform Majority, AND, OR, XOR function or its inverted version, and an output O. All inputs are excited at the same frequency, propagate from their sources through the waveguide and interfere constructively or destructively based on their phases. The result is available at the output as a SW with the same frequency as the inputs. This is a scalar gate as each input SW represents one bit, thus in case the same function has to be pairwise evaluated on  $n$ -bit inputs this can be done in parallel by instantiating n such gates or serially by using one gate only with the associated area and delay overhead, respectively. In the following section we take advantage of different frequency SW interaction behaviour and introduce data parallel SW gates that can process n-bit inputs without hardware replication or serialisation.

#### III. n-BIT DATA PARALLEL SW LOGIC GATE

Figure 5 presents the parallel spin wave logic gate we introduced in<sup>51</sup>, which is able to concurrently process m n-bit inputs. As indicated in the Figure, the input sets  $\mathcal{I}_i = \{I_{i,1},\}$  $I_{i,2}, I_{i,3}, \ldots, I_{i,m}$ ,  $i = 1, 2, \ldots, n$ , are simultaneously encoded into SWs with frequency  $f_i$  by means of, e.g., Magnetoelectric (ME) cells or antennas. Subsequently, the SWs corresponding the sets  $\mathcal{I}_i, i = 1, 2, \ldots, n$  propagate through the waveguide without affecting each other until reaching the Functional Region (FR). Once the  $m \times n$  spin waves arrive at FR, equalfrequency spin waves interfere constructively and destructively depending on their phases, producing *n* output SWs  $\mathcal{O}_i = \mathcal{F}(\mathcal{I}_i), i = 1, 2, ..., n$ , where  $\mathcal F$  is the gate function, e.g., AND, OR, XOR. Those SWs can be sensed and transformed into the voltage domain by the detection cells located at  $O_1, O_2, \ldots, O_n$  or transmitted to the next SW gate.

Although the approach in Figure 5 is generic its practical realization requires stacked waveguides and contains bent regions, which impede smooth SW propagation. We address these issues by applying the same idea on a single waveguide structure and constructing the in-line gate in Figure 6.



FIG. 4. Conventional SW Logic Gate Structure



FIG. 5. Multi-Frequency Spin Wave Logic Gate

|                |                                | а'n             |                                |                           |                                 | $\mathsf{d}$ n $\times$ m $\mathsf{d}$ n |                                                                                                                                                                            |                           |
|----------------|--------------------------------|-----------------|--------------------------------|---------------------------|---------------------------------|------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|
|                | d2                             |                 |                                |                           | $d$ <sub>n</sub> $\times$ m+2   |                                          |                                                                                                                                                                            |                           |
| d <sub>1</sub> |                                |                 |                                |                           | $\mathsf{dn}\times\mathsf{m}+1$ |                                          |                                                                                                                                                                            |                           |
|                | $F_1$ , $F_2$ ,                | $F_{\rm BH}$    |                                |                           |                                 |                                          | $\overline{F_1}$ , $\overline{F_2}$ , $\overline{F_6}$ , $\overline{F_1}$ , $\overline{F_2}$ , $\overline{F_2}$ , $\overline{F_6}$ , $\overline{F_1}$ , $\overline{F_2}$ , | i F <sub>n</sub>          |
| $ _{11}$       |                                | ln.1            | 1.2                            | $\ln 2$                   |                                 | $\mathsf{In}$ ,m                         |                                                                                                                                                                            |                           |
|                | i 1st i 1st i<br>1 F1 1 1 F2 1 | 1st <br>$F_{n}$ | 12ndi 12ndi<br>$IF_1 + IF_2 +$ | i 2ndi<br>IF <sub>n</sub> | imthi imth<br>$IF_1$ $IF_2$ $I$ | ı mthi<br>$IFn$ $\perp$                  | $12st + 2nd$<br>I bit I I bit I                                                                                                                                            | ı nth<br><sup>I</sup> bit |
|                | bit l bit l                    | l bit l         | bit l bit l                    | l bit l                   | bit lbit l                      | $\frac{1}{2}$ bit                        |                                                                                                                                                                            |                           |

FIG. 6. n-bit Inputs In-line Spin Wave Logic Gate

Note that for proper gate operation, SWs with the same frequency must be excited with the same amplitude and wavelength. Moreover, the distances between input sources and interference locations are SW frequency specific and crucial for proper gate functionality, thus they must be accurately determined. For example, if constructive interference is required for in-phase SWs and destructive for out of phase SWs, the distances between the same frequency sources must be  $j_q \times \lambda_i$ ,  $i = (1, 2, 3, \ldots, n)$ , i.e,  $d_1 = j_1 \lambda_1$ ,  $d_2 = j_2 \lambda_2$ , ...  $d_{nm} = j_{nm} \lambda_n$ , where  $j_q = \{1, 2, 3, \ldots\}$ ,  $q = 1, 2, 3, \ldots, nm$ . Note that to minimize gate area and delay  $j_q = 1$  is the preferred choice, which is feasible for scalar gates but not always possible for parallel gates. Whereas, the distances must be  $(j_q + \frac{1}{2})$  $(\frac{1}{2})\lambda_i$ , i.e.,  $d_1 = (j_1 + \frac{1}{2})$  $(\frac{1}{2})\lambda_1,$  $d_2 = (j_2 + \frac{1}{2})$  $(\frac{1}{2})\lambda_2, \ldots, d_{nm} = (j_{nm} + \frac{1}{2})$  $\frac{1}{2}\lambda_n$ , if the opposite behaviour is desired.

In view of the previous discussion each output wave  $\mathcal{O}_i$  is available for detection after a delay determined by the distance between the most faraway input cell of the  $\mathcal{I}_i$  set, i.e.,  $I_{i,1}$  in Figure 6, and the output cell  $O_i$ , thus full parallelism is achieved. Note that the actual gate delay value can be optimized by choosing appropriate, e.g., waveguide material, dimensions, thickness, as discussed in Section IV.

While delay optimization is a matter of waveguide material and geometry choice, the gate area can be minimized by changing the position of the input and output transducers. One can observe in Figure 6 that input and output cells are ordered by bit position for clarity purpose. However, they can be shuffled as long as the previously discussed constraints are still satisfied, and this results in an area (overall gate length) reduction. To this end we introduce Algorithm 1, which identifies the transducer (source/detector) locations that are minimizing the waveguide length, while not infringing the wavelength dependent inter transducers distance constraints. The algorithm iteratively construct the gate structure by instantiating one input set  $\mathcal{I}_i$ ,  $i = 1, 2, \ldots, n$  at a time, while optimizing its transducer positions in relation to the already optimized structure embedding the previously instantiated sets  $\mathcal{I}_j, j = 1, 2, \ldots, i - 1$ .

The algorithm starts with a configuration in which all transducers are placed overlapped at the waveguide beginning. Subsequently, inputs sets are processed one at a time by initially placing them one after the other at  $D$  distance regardless of the wavelength of the SW they process (line 3 to 7). If the first set was the one currently processed no further adjustments are required and the second set can be considered for placement. If this is not the case, the for loop (line 9 to 24) is repositioning the transducer at the correct positions, which are multiples of their wavelength frequency. After this step, the transducer configuration for the up to date processed sets is the same as in Figure 6. Next, the for loop (line 25 to 38) performs the area optimization by checking the spaces between transducers and if it is possible moving one transducer if its wavelength imposed distance condition is satisfied. If one transducer has been moved Sort reorders the transducers in the TP matrix to capture the new configuration. These stpdf are repeated until all sets are placed and the gate length optimized. At the end, the gate area is calculated by multiplying the waveguide width by the waveguide length.

Let us assume a 3-bit 2-input gate operating on SWs with wavelength  $\lambda_1=100 \text{ nm}$ ,  $\lambda_2$ =50 nm, and  $\lambda_3$ =19 nm, 10 nm transducer length, and 1 nm minimum distance between transducers. By following the structure in Figure 6, the second input set can begin at 33 nm from the waveguide start because the first three sources  $I_{1,1}, I_{1,2}, I_{1,3}$  occupy each 10 nm and are 1 nm distanced apart. As such the initial order is  $(I_{1,1}, I_{1,2}, I_{1,3}, I_{2,1}, I_{2,2}, I_{2,3}, O_1, O_2, O_3)$ with a corresponding waveguide length of  $288 \text{ nm}$ . The optimization algorithm changes the order to  $(I_{1,1}, I_{1,2}, I_{1,3}, I_{2,3}, I_{2,2}, I_{2,1}, O_3, O_2, O_1)$ , which corresponds to a 210 nm waveguide length thus about 27% area savings.

Furthermore, two main methods can be utilized for output detection: (i) Phase detection, and (ii) Threshold detection. In the first case, a predefined phase is utilized as reference and

#### Algorithm 1 Data Parallel Gate Area Optimization

**Inputs**: WE, L, D, w, d[i], i=1:n,  $\lambda[i]$ , i=1:n **Outputs**: TP[i,j], i=1:n; j =1:m+1, A

. WE is the waveguide end, L the transducer length, D the minimum distance between consecutive transducers, w the waveguide width, d the distance between two consecutive inputs of the same frequency,  $TP$  is the transducer position, A is the gate area.

```
1: TP[1:n,1:m+1] = 02: WE = 03: for j = 1 to m + 1 do
4: for i = 1 to n do
5: TP[i,j] = WE6: WE = WE + L + D7: end for
8: if j > 1 then
9: for i = 1 to n do
10: d[i] = TP[i,j] - TP[i,j-1]11: \qquad \qquad \text{if } \left\lceil \frac{d[i]}{\lambda[i]} \right\rceil \times \lambda_i = d[i] \text{ then}12: TP[i,j] = TP[i,j]13: else
14: TP[i, j] \leftarrow \left\lceil \frac{d[i]}{\lambda[i]} \right\rceil \times \lambda[i]15: end if
16: if i = 1 then
17: TP[i-1,j] = TP[n,j-1]\begin{array}{ccc} 18: & & \text{end if} \\ 19: & & \text{if } \text{TP} \end{array}if TP[i,j] - TP[i-1,j] > D + L then
20: TP[i,j] = TP[i,j]\begin{array}{ccc} 21: & & \text{else} \ 22: & & \end{array}22: TP[i,j] = TP[i,j] + \lambda[i]<br>23: end if
                end if
24: end for 25: for i=25: for i = 1 to n do<br>26: if i = 1 then
26: if i = 1 then<br>
27: TP[i-1,j] =
                   \mathrm{TP}[\mathrm{i}\text{-}1,\mathrm{j}] \,=\, \mathrm{TP}[\mathrm{n},\mathrm{j}\text{-}1]28: end if
29: if TP[i,j] - TP[i-1,j] > D + L then
30: for c = 1 to n do
31: if \left\lceil \frac{TP[i,j]+D+L}{\lambda[c]} \right\rceil \times \lambda[c] = TP[i,j]+D+L32: then
33: TP[c,j] = TP[i,j] + D + L34: TP \leftarrow Sort(TP)35: end if
36: end for
37: end if 38: end for
38: end for 39: end if
        end if
40: end for
41: WE = TP[n,m+1] + L42: A = WE \times w
```
a phase difference of 0 represents a logic 0, and a phase difference of  $\pi$  a logic 1. The second detection method assesses the SW magnetization (SWM) value and reports a 0 logic if the SWM is smaller than a predefined threshold value and a logic 1 otherwise. If phase detection is in place, the gate can provide non-inverted or inverted output (or even both of them) by adjusting the reading location. For instance, referring to Figure 6, the detectors must be placed at a distance equal to (from the last  $f_i$  SW source)  $(j_q + \frac{1}{2})$  $(\frac{1}{2})\lambda_i, i = (1, 2, 3, \dots, n)$ , such that  $d_{nm+1} = (j_{nm+1} + \frac{1}{2})$  $(\frac{1}{2})\lambda_1, d_{nm+2} = (j_{nm+2} + \frac{1}{2})$  $(\frac{1}{2})\lambda_2, \ldots, d_{nm+n} = (j_{nm+n} + \frac{1}{2})$  $(\frac{1}{2})\lambda_n$ , if the noninverted results are desired. However, the detectors must be placed at a distance equal to (from the last  $f_i$  SW sources)  $j\lambda_i$  such that  $d_{nm+1} = j_{nm+1}\lambda_1$ ,  $d_{nm+2} = j_{nm+2}\lambda_2$ , ...,  $d_{nm+n} =$  $j_{nm+n}\lambda_n$  if the compliment is required. In the case of threshold based detection, the gate can provide non-inverted or inverted outputs without changing the output detector position by just switching the thresholding condition in the detector cell. Note that, regardless of the detection method, each read location should be as close as possible to the last input in its set to diminish the due to damping SW energy lost and process high amplitude spin waves.

## IV. SIMULATION SETUP

This section provides inside on the utilized simulation parameters, and performed experiments.

#### A. Simulation Parameters

 $Fe_{60}Co_{20}B_{20}$  waveguides that have waveguide width of 50 nm with Perpendicular Magnetic Anisotropy (PMA) are utilized for all gate constructions. We note that for this material the anisotropy field  $H_{anisotropy} > M_s$ , which means that there is no need for the application of an external magnetic field<sup>52</sup>. Table I presents the parameter we utilize to validate the 8-bit 2-input XOR/XNOR and 3-input Majority gates. The 8 SW frequencies are 10 GHz, 20 GHz, 30 GHz, 40 GHz, 50 GHz, 60 GHz, 70 GHz, and 80 GHz. By making use of the FVW dispersion relation and given that the wavenumber  $k = \frac{2\pi}{\lambda}$  $\frac{2\pi}{\lambda}$ , we determine the distances between transducers exciting/detecting SWs with the same frequency are:  $d_1=166 \text{ nm } (j=2)$ ,  $d_2=100 \,\mathrm{nm}$  (j=2),  $d_3=117 \,\mathrm{nm}$  (j=3),  $d_4=165 \,\mathrm{nm}$  (j=5),  $d_5=174 \,\mathrm{nm}$  (j=6),  $d_6=130 \,\mathrm{nm}$  (j=5),  $d_7=168 \,\mathrm{nm}$  (j=7), and  $d_8=176 \,\mathrm{nm}$  (j=8),  $d_9=166 \,\mathrm{nm}$  (j=2),  $d_{10}=100 \,\mathrm{nm}$  (j=2),  $d_{11}=117 \,\mathrm{nm}$ (j=3),  $d_{12}=132 \text{ nm}$  (j=4),  $d_{13}=145 \text{ nm}$  (j=5),  $d_{14}=104 \text{ nm}$  (j=4),  $d_{15}=144 \text{ nm}$  (j=6), and  $d_{16}=44 \,\mathrm{nm}$  (j=2),  $d_{17}=166 \,\mathrm{nm}$  (j=2),  $d_{18}=150 \,\mathrm{nm}$  (j=3),  $d_{19}=156 \,\mathrm{nm}$  (j=4),  $d_{20}=66 \,\mathrm{nm}$ (j=2),  $d_{21}=87 \text{ nm}$  (j=3),  $d_{22}=78 \text{ nm}$  (j=3),  $d_{23}=72 \text{ nm}$  (j=3), and  $d_{24}=110 \text{ nm}$  (j=5). Note that  $d_1$  to  $d_{16}$  are the distances between transducers exciting/detecting SWs with the same frequency for XOR gate, and  $d_1$  to  $d_{24}$  are the distances between transducers exciting/detecting SWs with the same frequency for Majority gate. Furthermore, an 1 nm minimum separation distance between transducers is in place. Note that logic 0 represents

| Parameters                                                                     | Values                 |  |  |
|--------------------------------------------------------------------------------|------------------------|--|--|
| Magnetic saturation $M_s$                                                      | $1.1 \times 10^6$ A/m  |  |  |
| Perpendicular anisotropy constant $k_{ani}   8.3177 \times 10^5 \text{ J/m}^3$ |                        |  |  |
| Damping constant $\alpha$                                                      | 0.004                  |  |  |
| Waveguide thickness $t$                                                        | 1 nm                   |  |  |
| Exchange stiffness $A_{exch}$                                                  | $18.5 \,\mathrm{pJ/m}$ |  |  |

TABLE I. Parameters

SW with phase 0 and logic 1 represents SW with phase  $\pi$ .

### B. Performed Simulations

We perform the following simulation experiments:

- 8-bit 2-input XOR/XNOR gate with threshold detection. The two 8-bit inputs are simultaneously excited using the sources  $(I_{1,1}, I_{2,1}, I_{3,1}, \ldots, I_{8,2})$ . The excited spin waves propagate through the waveguide and those who have the same frequencies interfere with each other. The resulting spin waves propagate towards the output where they are captured at  $O_1, O_2, \ldots, O_8$  based on threshold detection. We carry on the validation of both area unoptimized  $(I_{1,1}, I_{2,1}, I_{3,1}, I_{4,1}, I_{5,1}, I_{6,1}, I_{7,1}, I_{8,1},$  $I_{1,2}, I_{2,2}, I_{3,2}, I_{4,2}, I_{5,2}, I_{6,2}, I_{7,2}, I_{8,2}, I_{1,3}, I_{2,3}, I_{3,3}, I_{4,3}, I_{5,3}, I_{6,3}, I_{7,3}, I_{8,3})$  and optimized  $(I_{1,1}, I_{2,1}, I_{3,1}, I_{4,1}, I_{5,1}, I_{6,1}, I_{7,1}, I_{8,1}, I_{2,2}, I_{3,2}, I_{1,2}, I_{6,2}, I_{4,2}, I_{5,2}, I_{7,2}, I_{8,2}, I_{2,3}, I_{8,3}, I_{3,3}, I_{1,3},$  $I_{6,3}, I_{4,3}, I_{5,3}, I_{7,3}$  configurations. Note that as detectors order is not important they follow the same pattern, i.e.,  $(O_1, O_2, O_3, O_4, O_5, O_6, O_7, O_8)$  in both cases.
- 8-bit 3-input Majority gate based on phase detection. We again considered area unoptimized and optimized gate instances but in this case detector order is relevant, thus the after optimization source and detector order is  $I_{1,1}, I_{2,1}, I_{3,1}, I_{4,1}, I_{5,1}, I_{6,1}, I_{7,1}, I_{8,1}$ ,  $I_{2,2}, I_{3,2}, I_{1,2}, I_{6,2}, I_{4,2}, I_{5,2}, I_{7,2}, I_{8,2}, I_{2,3}, I_{8,3}, I_{3,3}, I_{1,3}, I_{6,3}, I_{4,3}, I_{5,3}, I_{7,3}, O_6, O_8, O_4, O_2, O_5,$  $O_1, O_7, O_3.$



FIG. 7. Unoptimized 8-bit XOR Gate Time and Frequency Response. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase  $\pi$ 

## V. SIMULATION RESULTS AND DISCUSSION

This section presents simulation results for the 8-bit 2-input XOR/XNOR and 3-input Majority gate instances, performance estimations, and a comparison with SW state-of-theart functionally equivalent structures. Subsequently, it discusses fan-in and geometric scalability, and maximum achievable parallelism (upper bound of the number of practically achievable SW frequencies) issues, and variability and thermal noise effects.

### A. Simulation Results

#### 8-bit 2-input threshold detection based  $XOR/XNOR$  gate

Figure 7 presents OOMMF simulation results for the area unotimized byte-based 2-input XOR gate instance. The y-axis reflects the output SWs Mx over Ms ratio, i.e., magnetization in the x-direction over magnetic saturation. To simplify the Figure we only assume all 0s



FIG. 8. Unoptimized 8-bit XOR Gate Outputs a)  $f_1=10$  GHz, b)  $f_2=20$  GHz, ..., h)  $f_8=80$  GHz. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase  $\pi$ 

and all 1s input sets, thus only four input combinations are possible, and as such the gate response to any input combination is the same in all frequencies. As expected same-frequency SW pairs interfere without affecting the other SWs and this is clear from Figure 7, which indicates that 8 different frequencies components exist without distorting each-other in the Fast Fourier Transform (FFT) amplitude spectrum for all the considered input combinations. Moreover, as it can be noticed from Figure 8, the output SWs are not distorted and can be properly detected for each frequency. Let us consider the first output detection cell, which



FIG. 9. Optimized 8-bit XOR Gate Time and Frequency Response. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase  $\pi$ 

is tuned for the 10 GHz SW. When reading the output at time 0.5 ns for  $\mathcal{I}_1 = \mathcal{I}_2 = 0$  and  $\mathcal{I}_1 = \mathcal{I}_2 = 1$ , the absolute SW magnetization value is greater than 0.0035  $M_s$  due to the constructive interference, whereas the SW magnetization is less than 0.0035  $M_s$  when one input set is 0 and the other one is 1. Therefore, if the detection threshold is set to 0.0035  $M_s$  an XOR function is obtained as a SW magnetization greater (lower) than 0.0035  $M_s$ is read as a logic 0 (1). An XNOR can be realized by flipping the condition such that a SW magnetization lower (greater) than 0.0035  $M_s$  is read as a logic 0 (1). Similarly, for the second detection cell, which targets the 20 GHz SW a threshold value of 0.0032  $M_s$  is in place and by following a similar way of reasoning threshold values of  $0.0028 M_s$ ,  $0.0025 M_s$ ,  $0.0022$  $M_s$ , 0.0017  $M_s$ , 0.0015  $M_s$ , and 0.001  $M_s$  can be determined for the rest of frequencies.

Figure 9 and 10 present OOMMF simulation results for the optimized 8-bit 2-input XOR gate. As depicted in Figure 10, the simulation proves the correct functionality of the XOR/XNOR gate. One can observe in the Figure that in this case the SW magnetization



FIG. 10. Optimized 8-bit XOR Gate Outputs: a)  $f_1=10 \text{ GHz}$ , b)  $f_2=20 \text{ GHz}$ , ..., h)  $f_8=80 \text{ GHz}$ . Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase  $\pi$ 

at all frequencies is higher as the spin waves propagate on lower distances when compared with the non-optimized case. In addition, the detection threshold values are higher, i.e., 0.007  $M_s,\ 0.005$   $M_s,\ 0.0045$   $M_s,\ 0.0038$   $M_s,\ 0.0034$   $M_s,\ 0.0027,\ 0.0025$   $M_s,\$  and 0.002  $M_s,$ therefore, less sensitive detectors are requited for the XOR/XNOR gate implementation.



FIG. 11. Unoptimized 8-bit Majority Gate Time and Frequency Response. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase  $\pi$ 

### 8-bit phase detection based 3-input Majority gate

The 8-bit 3-input unoptimized Majority gate OOMMF simulation results are presented in Figure 11. The same notations are in place and again, to simplify the Figure we only assume all 0s and all 1s input sets, thus only 8 input combinations are presented. The Figure clearly demonstrate proper gate functionality as 8 different frequencies components exist without distorting each-other in the Fast Fourier Transform (FFT) amplitude spectrum for all the possible input combinations  $(\mathcal{I}_1 = \mathcal{I}_2 = \mathcal{I}_3 = 0), (\mathcal{I}_1 = \mathcal{I}_2 = 0, \mathcal{I}_3 = 1), \ldots,$  $(\mathcal{I}_1 = \mathcal{I}_2 = \mathcal{I}_3 = 1)$ . Figure 12 indicates that the output SWs are not distorted and can be properly detected for each frequency. Let us concentrate on Figure 12a, which captures the 10 GHz 3-input Majority gate response and consider the output at time moment 0.75 ns, When the three inputs have the same phase of 0  $(I_1I_2I_3 = 000)$  they constructively interfere in the waveguide resulting in a phase of 0 SW, which corresponds to a logic 0. Also, when at most one of the inputs is logic 1  $(I_1I_2I_3 = 001, I_1I_2I_3 = 010, I_1I_2I_3 = 100)$ , i.e., has phase



FIG. 12. Unoptimized 8-bit Majority Gate Outputs a)  $f_1=10 \text{ GHz}$ , b)  $f_2=20 \text{ GHz}$ , ..., h)  $f_8=80$  GHz. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase  $\pi$ 

of  $\pi$ , the SWs interfere constructively and destructively, and the results is still a logic 0. In contrast, if at most one of the inputs is logic 0  $(I_1I_2I_3 = 011, I_1I_2I_3 = 110, I_1I_2I_3 = 101)$ , then the output is logic 1 as a result of the interferences. Further, when the three inputs



FIG. 13. Optimized 8-bit Majority Gate Time and Frequency Response. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase  $\pi$ 

have the same phase of  $\pi$  ( $I_1I_2I_3 = 111$ ), then spin waves interfere constructively in the waveguide, which results in a phase of  $\pi$ , which corresponds to a logic 1. The same line of reasoning can be applied for all the other 7 cases as it is clearly indicated by Figure 12.

The optimized 8-input 3-input Majority gate OOMMF simulation results are presented in Figure 13 and 14. As it can be observed from Figure 14, the gate functions correctly while the SW amplitudes are higher as due to the optimization SWs propagate over shorter distances, which enables the utilization of less sensitive detectors.

# B. Performance Evaluation

To get inside on the practical potential of our proposal, we evaluate and compare the 8-bit gates with functionally equivalent state-of-the-art SW implementation obtained by the instantiation of 8 normal (scalar) Majority/XOR gates, in terms of area, delay, and power consumption. In our evaluations we make the following assumptions: (i) source/detector



FIG. 14. Optimized 8-bit Majority Gate Outputs a)  $f_1=10$  GHz, b)  $f_2=20$  GHz, ..., h)  $f_8=80$  GHz. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase  $\pi$ 

dimensions are  $10 \text{ nm} \times 50 \text{ nm}$  as suggested in<sup>51</sup>, (ii) SW propagation through the waveguide doesn't consume noticeable energy, and (iii) transducer delay is  $0.42 \text{ ns}^{53}$ .

Under this assumptions we first evaluate the optimization algorithm impact on the 8-

bit gates area. Our calculations indicate that the unoptimized XOR and Majority gates have an area of  $0.025\ 25\ \mu\text{m}^2$  and  $0.047\ 25\ \mu\text{m}^2$ , respectively, which become  $0.017\ 55\ \mu\text{m}^2$ and  $0.0279 \,\mu\text{m}^2$ , respectively, after the optimization. This clearly proves the algorithm efficiency as it diminishes the area by 30% and 41%, respectively. As the standard functionally equivalent implementations require 8 2-input XOR and 8 3-input Majority gates it occupies  $0.0784 \,\mu\text{m}^2$  and  $0.116 \,\mu\text{m}^2$  real estate, respectively, our proposal enables a 4.47x and 4.16x area reduction, respectively.

Generally speaking, to calculate an SW gate delay one needs to sum-up the time associated to SW generation, propagation, and detection. The due to SW propagation through the waveguide delay depends on the travelled distance from generation to detection and it can be computed by dividing the distance by the SW group velocity, which is 3500 m/s for  $\text{CoFeB}^{48}$ . Given that the longest propagation path for the 8-bit 2-input XOR and 3input Majority gates is 351 nm and 558 nm, respectively, the propagation delay is 100 ps and 159 ps, respectively, which by adding the transducers delay sums up to 940 ps and 999 ps, respectively. For the scalar 2-input XOR and 3-input Majority gates the longest path is 196 nm and 290 nm, respectively, which translates into a transmission delay of 56 ps and 83 ps, respectively, and 896 ps and 923 ps overall gate delay, respectively. Thus, the 8-bit 2-input XOR and 3-input Majority gates are slower than their scalar counterparts with 5% and 7%, respectively.

As both parallel and scalar gate implementations make use of the same number of transducers and the through the waveguide propagation consumes insignificant power, the two implementations are equivalent in terms of power consumption.

### C. Fan-in and Geometrical Scalability

The proposed structure is generic and the number of bits per frequency, i.e., the gate fan-in, shouldn't affect its functionality. However, as the number of inputs increases, the damping effect might play a more significant role in diminishing SW amplitudes. Thus, if a large number of inputs is targeted, it might be needed to excite the same frequency SW inputs in Figure 6 at different energy levels  $E_n < E_{n-1} < \ldots < E_1$ , where  $E_i$  is the energy that the  $i<sup>th</sup>$  SW is excited at. We note however that: (i) usual fan-in values are rather small (2 and 3 in the gates we designed), (ii) energy level differentiation is only required for large



FIG. 15. MAJ Gate Outputs at  $f_1=10 \text{GHz}$ . Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase  $\pi$ 

fan-in values in case the logic gate doesn't function correctly, and (iii) within certain limits the SW energy levels can be adjusted by properly biasing the source transducers.

To get inside on the effect of the waveguide width on gate functionality we scaled it from 50 nm up to 500 nm. It was noticed that scaling doesn't affect the gates functionality and it doesn't generate any crosstalk effects. We note that, as waveguide width increases, the ferromagnetic resonance frequency decreases and thus lower SW frequencies can be utilized. Although this is advantageous from signal loss perspective such structures require stronger static magnetic fields, which results in area and energy consumption overheads.

#### D. Practically Achievable Parallelism

To get some inside on the data parallelism practical upper-bound we examined the consequences of increasing the number of bits per set, i.e., utilized frequencies. To this end we OOMMF simulate 8-bit and 9-bit 3-input Majority gate instances and display in Figure 15 the 10 GHz frequency output component for the input combinations  $\mathcal{I}_1 \mathcal{I}_2 \mathcal{I}_3 = 000$  and  $\mathcal{I}_1\mathcal{I}_2\mathcal{I}_3 = 100$ . One can observe in the Figure that at time=0.5 ns the 8-bit Majority gate output has the same phase for the considered input combination, which reflects the correct functionality of the Majority gate as in both cases 0 is the majority. However, the 9-bit Majority gate output at time=0.5 ns has different phase, 0 for  $\mathcal{I}_1\mathcal{I}_2\mathcal{I}_3 = 000$ , and approximately  $\pi/4$  for  $\mathcal{I}_1\mathcal{I}_2\mathcal{I}_3 = 100$ , which indicate that the gate starts to malfunction. Based on this we can conclude that, for the proposed topology and utilized material, 8 is the maximum



FIG. 16. XOR Gate Outputs at  $f_2=20\text{GHz}$ . Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase  $\pi$ 

number of frequencies one can use to construct robust parallel SW gates.

However, one can go beyond this limit if threshold detection based it utilized. To examine the effect of embedding more than 8 frequencies we evaluate by means of OOMMF simulations 2-input XOR gates with 8, 9, 10, and 16 frequencies. For illustration purpose we display in Figure 16 the 20 GHz frequency output component for the input combinations  $\mathcal{I}_1\mathcal{I}_2 = 00$  and  $\mathcal{I}_1\mathcal{I}_2 = 01$ , which should give a 0 and 1 output value, respectively, for all the considered input widths. The Figure clearly indicates that while the spin wave magnetization difference between the two input combinations decreases as the number of frequency increases, which makes output detection more challenging, two different levels can still be distinguished and a threshold defined, as such if the spin wave magnetization is greater than that threshold, the output is 0, and 1 otherwise. To clarify this let us inspect the output value at time moment 0.4 ns for the 8, 9, 10, and 16-bit XOR gates. For the input combination  $\mathcal{I}_1\mathcal{I}_2 = 00$  the output SW has a higher amplitude than the one corresponding to  $\mathcal{I}_1\mathcal{I}_2 = 01$ , which means that a threshold can be set and based on threshold detection, X(N)OR can be detected. This suggests that for threshold detection based gates are more robust and can operate with up to 16-bit inputs. Note that more than 16-bit inputs might be realizable but it is part of planned future work.

Figure 17 presents OOMMF simulation results for the 16-bit based 2-input XOR gate. As it can be observed from the FFT magnitude spectrum in Figure 17, the information is encoded in SWs with 16 different frequencies, 10, 20, ..., 160 GHz and the output for all the possible input combinations  $(\mathcal{I}_1 = \mathcal{I}_2 = 0), \ldots, (\mathcal{I}_1 = \mathcal{I}_2 = 1)$  can be detected at each



FIG. 17. Optimized 16-bit Majority Gate Response in Time and Frequency. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase  $\pi$ 

frequency. To further examine the results, we filter each frequency component for different input combinations separately in Figure 18 and one can observe that the output SWs are not distorted and can be properly detected at each frequency, which means that the 16 bit XOR/XNOR gate operates correctly. Let us consider the 20 GHz output time moment 0.75 ns and a detection threshold value of 0.04  $M_s$ . For  $\mathcal{I}_1 = \mathcal{I}_2 = 0$ , or  $\mathcal{I}_1 = \mathcal{I}_2 = 1$  the absolute SW magnetization value is greater than  $0.04 M_s$  due to the constructive interference, which means 0 logic output as it should. For  $\mathcal{I}_1 = 0\mathcal{I}_2 = 1$ , or  $\mathcal{I}_1 = 1\mathcal{I}_2 = 0$  the absolute SW magnetization value is lower than 0.04  $M_s$ , which means a 1 logic output as it should. An XNOR can be realized by flipping the condition such that a SW magnetization lower (greater) than 0.04  $M_s$  is read as a logic 0 (1). The same line of reasoning can be utilized to determine all threshold values as, 0.045  $M_s$ , 0.04  $M_s$ , 0.038  $M_s$ , 0.033  $M_s$ , 0.032  $M_s$ , 0.03  $M_s$ , 0.028  $M_s$ , 0.025  $M_s$ , 0.02  $M_s$ , 0.015  $M_s$ , 0.01  $M_s$ , 0.007  $M_s$ , 0.0068  $M_s$ , 0.005  $M_s$ , 0.0045  $M_s$ , 0.004  $M_s$ , 0.0035  $M_s$ , and 0.002  $M_s$ , for value increasingly ordered frequencies.



FIG. 18. Optimized XOR Gate Outputs: a)  $f_1=10$  GHz, b)  $f_2=20$  GHz, ..., p)  $f_{16}=160$  GHz. Note that logic 0 represents SW with phase 0 and logic 1 represents SW with phase  $\pi$ 

### E. Discussion

SW community's theoretical and practical contributions clearly demonstrate SW computing paradigm potential to provide support for the implementation of energy effective compu-

tation platforms able to outperform traditional Boolean algebra CMOS base counterparts<sup>23</sup>. However a number of road blockers need to be properly removed in order to transform this potentiality into actual reality $^{23}$ :

- Immature technology: ME cells seem to be the most promising transducer to excite and detect SWs because they have ultra low energy consumption, acceptable delay and they are highly scalable. However, it is not possible yet to realize them experimentally.
- Cost and Complexity: Conceptually speaking SW can be scaled down to nm range as SW device must be greater than  $\lambda$  which can be in nm range, but a number of open issues still need to be addressed before realizing nano-scale SW device such as: i) Excitation and detection: it is not possible to distinguish nm SWs from noise until now in the *nm* range SW device measurements, ii) Variability - SW gate behaviour is sensitive to its geometry and dimension and phase changes beyond a certain range may make them malfunction. Also, the frequency cannot be set exactly to the nominal value. These issues should be addressed in the design stpdf such that there should be sufficient margin for the correct detection of the output.

We are confident however that the industry will find, as always, the way towards nm range multifrequency SW circuits and systems and to take practical advantage of the SW computing paradigm potential.

#### F. Variability and Thermal Noise Effects

In this paper, our main purpose is to propose and validate an intrinsic data parallel spin wave technology under ideal conditions as a proof of concept, while disregarding factors, e.g., edge roughness, waveguide dimension variations, spin wave strength variation, and thermal noise, which might negatively affect the performance of the proposed concept. However, in54,55, the effects of waveguide trapezoidal cross section and edge roughness were investigated and demonstrated that they have a rather limited impact in gate behavior, which preserve functionality under their presence. Moreover, an investigation of a SW gate behaviour at different temperatures was presented  $in<sup>54</sup>$ . At different temperatures, it was noticed that the gate functions correctly and that the temperature variation effect is rather limited. In addition to that, as our proposed structure is in-line waveguide width variations do not affect gate functionality, thus we expected it to be rather robust to dimension variations. Despite that fact that we expect that variability and thermal noise do not fundamentally affect the proposed gate behaviour, a thorough investigation of such effects is part of the planned future work.

## VI. CONCLUSIONS

A novel n-bit data parallel spin wave logic gate was proposed in this paper. In order to explain the proposed concept, we implemented and validated by means of OOMMF, 8-bit 2-input XOR and 3-input Majority gates. Further, we proposed an optimization algorithm to minimize the area overhead of the proposed multi-frequency gates and demonstrate that the algorithm diminishes the area by  $30\%$  and  $41\%$  for XOR and MAJ gates implementations, respectively. Moreover, to asses the potential of our proposal, we evaluated and compared the proposed multifrequency gates with functionally equivalent scalar SW gate based implementations in terms of area, delay, and power consumption. The results indicated that the byte-based XOR and Majority gates require 4.47x and 4.16x area less than the conventional (scalar) implementations, respectively, at the expense of 5% to 7% delay overhead and without inducing any power consumption overhead. Finally, we demonstrated that, for current gate topology and materials, the maximum number of frequencies (gate parallelism) is 8 and 16 for phase and threshold based output detection, respectively.

#### ACKNOWLEDGEMENT

This work has received funding from the European Union's Horizon 2020 research and innovation program within the FET-OPEN project CHIRON under grant agreement No. 801055. It has also been partially supported by imec's industrial affiliate program on beyond-CMOS logic. F.V. acknowledges financial support from Flanders Research Foundation (FWO) through grant No. 1S05719N.

#### REFERENCES

<sup>1</sup>N. D. Shah, E. W. Steyerberg, and D. M. Kent, "Big data and predictive analytics: Recalibrating expectations," JAMA, 2018.

- <sup>2</sup>R. L. Villars, C. W. Olofson, and M. Eastwood, "Big data: What it is and why you should care," IDC, 2011.
- <sup>3</sup>S. Agarwal et al., "International roadmap of devices and systems 2017 edition: Beyond cmos chapter." Sandia National Lab.(SNL-NM), Albuquerque, NM (United States), Tech. Rep., 2018.
- <sup>4</sup>D. Mamaluy and X. Gao, "The fundamental downscaling limit of field effect transistors," Applied Physics Letters, vol. 106, no. 19, p. 193503, 2015.
- <sup>5</sup>B. Hoefflinger, Chips 2020: a guide to the future of nanoelectronics. Springer Science & Business Media, 2012.
- <sup>6</sup>N. Z. Haron and S. Hamdioui, "Why is cmos scaling coming to an end?" in Design and Test Workshop, 2008. IDT 2008. 3rd International. IEEE, 2008, pp. 98–103.
- <sup>7</sup>Y. Jiang, N. C. Laurenciu, H. Wang, and S. D. Cotofana, "Graphene nanoribbon based complementary logic gates and circuits," IEEE Transactions on Nanotechnology, vol. 18, pp. 287–298, 2019.
- <sup>8</sup>Y. Jiang, N. Cucu Laurenciu, and S. D. Cotofana, "On basic boolean function graphene nanoribbon conductance mapping," IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 66, no. 5, pp. 1948–1959, 2019.
- <sup>9</sup>S. Choudhary and S. Khandate, "Implication of hydrogenation in tuning the magnetoresistance of graphene-based magnetic junction," IEEE Transactions on Nanotechnology, vol. 18, pp. 670–675, 2019.
- <sup>10</sup>Y. Jiang, N. C. Laurenciu, H. Wang, and S. D. Cotofana, "Graphene nanoribbon based complementary logic gates and circuits," IEEE Transactions on Nanotechnology, vol. 18, pp. 287–298, 2019.
- <sup>11</sup>S. Bansal et al., "Enhanced optoelectronic properties of bilayer graphene/hgcdte-based single- and dual-junction photodetectors in long infrared regime," IEEE Transactions on Nanotechnology, vol. 18, pp. 781–789, 2019.
- <sup>12</sup>H. Nili et al., "Comprehensive compact phenomenological modeling of integrated metaloxide memristors," IEEE Transactions on Nanotechnology, vol. 19, pp. 344–349, 2020.
- <sup>13</sup>S. N. Truong, K. Van Pham, and K. Min, "Spatial-pooling memristor crossbar converting sensory information to sparse distributed representation of cortical neurons," IEEE Transactions on Nanotechnology, vol. 17, no. 3, pp. 482–491, 2018.
- <sup>14</sup>C. E. Graves, C. Li, X. Sheng, W. Ma, S. R. Chalamalasetti, D. Miller, J. S. Ignowski, B. Buchanan, L. Zheng, S. Lam, X. Li, L. Kiyama, M. Foltin, M. P. Hardy, and J. P. Strachan, "Memristor tcams accelerate regular expression matching for network intrusion detection," IEEE Transactions on Nanotechnology, vol. 18, pp. 963–970, 2019.
- <sup>15</sup>N. Zheng and P. Mazumder, "Learning in memristor crossbar-based spiking neural networks through modulation of weight-dependent spiketiming-dependent plasticity," IEEE Transactions on Nanotechnology, vol. 17, no. 3, pp. 520–532, 2018.
- <sup>16</sup>M. R. Mahmoodi, A. F. Vincent, H. Nili, and D. B. Strukov, "Intrinsic bounds for computing precision in memristor-based vector-by-matrix multipliers," IEEE Transactions on Nanotechnology, vol. 19, pp. 429– 435, 2020.
- <sup>17</sup>K. P. Gnawali, S. N. Mozaffari, and S. Tragoudas, "Low power spintronic ternary content addressable memory," IEEE Transactions on Nanotechnology, vol. 17, no. 6, pp. 1206–1216, 2018.
- <sup>18</sup>H. Zhang et al., "Spintronic processing unit within voltage-gated spin hall effect mrams," IEEE Transactions on Nanotechnology, vol. 18, pp. 473–483, 2019.
- <sup>19</sup>D. Zhang, Y. Hou, L. Zeng, and W. Zhao, "Hardware acceleration implementation of sparse coding algorithm with spintronic devices," IEEE Transactions on Nanotechnology, vol. 18, pp. 518–531, 2019.
- <sup>20</sup>S. K. Thirumala, Y. Hung, S. Jain, A. Raha, N. Thakuria, V. Raghunathan, A. Raghunathan, Z. Chen, and S. K. Gupta, "Valley-coupledspintronic non-volatile memories with compute-in-memory support," IEEE Transactions on Nanotechnology, pp. 1–1, 2020.
- <sup>21</sup>A. Roohi and R. F. DeMara, "Parc: A novel design methodology for power analysis resilient circuits using spintronics," IEEE Transactions on Nanotechnology, vol. 18, pp. 885–889, 2019.
- $^{22}$ D. E. Nikonov and I. A. Young, "Overview of beyond-cmos devices and a uniform methodology for their benchmarking," Proceedings of the IEEE, vol. 101, no. 12, pp. 2498–2533, Dec 2013.
- <sup>23</sup>A. Mahmoud, F. Ciubotaru, F. Vanderveken, A. V. Chumak, S. Hamdioui, C. Adelmann, and S. Cotofana, "Introduction to spin wave computing," Journal of Applied Physics, vol. 128, no. 16, p. 161101, 2020. [Online]. Available: https://doi.org/10.1063/5.0019328
- <sup>24</sup>A. N. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Cotofana, and S. Hamdioui, "Spin wave normalization toward all magnonic circuits," IEEE Transactions on Cir-

cuits and Systems I: Regular Papers, pp. 1–14, 2020

- <sup>25</sup>A. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Cotofana, and S. Hamdioui, "2-output spin wave programmable logic gate," in 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2020, pp. 60–65.
- <sup>26</sup>A. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui, and S. Cotofana, "4-output programmable spin wave logic gate," in 2020 IEEE 38th International Conference on Computer Design (ICCD), 2020, pp. 332–335.
- <sup>27</sup>A. Mahmoud, F. Vanderveken, C. Adelmann, F. Ciubotaru, S. Hamdioui, and S. Cotofana, "Fan-out enabled spin wave majority gate," AIP Advances, vol. 10, no. 3, p. 035119, 2020. [Online]. Available: https://doi.org/10.1063/1.5134690
- <sup>28</sup>M. P. Kostylev, A. A. Serga, T. Schneider, B. Leven, and B. Hillebrands, "Spin-wave logical gates," Applied Physics Letters, vol. 87, no. 15, p. 153501, 2005. [Online]. Available: https://doi.org/10.1063/1.2089147
- <sup>29</sup>T. Schneider, A. A. Serga, B. Leven, B. Hillebrands, R. L. Stamps, and M. P. Kostylev, "Realization of spin-wave logic gates," Applied Physics Letters, vol. 92, no. 2, p. 022505, 2008. [Online]. Available: https://doi.org/10.1063/1.2834714
- <sup>30</sup>K.-S. Lee and S.-K. Kim, "Conceptual design of spin wave logic gates based on a mach–zehnder-type spin wave interferometer for universal logic functions," Journal of Applied Physics, vol. 104, no. 5, p. 053909, 2008. [Online]. Available: https://doi.org/10.1063/1.2975235
- <sup>31</sup>I. A. Ustinova et al., "Logic gates based on multiferroic microwave interferometers," in 2017 11th International Workshop on the Electromagnetic Compatibility of Integrated Circuits (EMCCompo), July 2017, pp. 104–107.
- <sup>32</sup>A. Khitun and K. L. Wang, "Nano scale computational architectures with spin wave bus," Superlattices and Microstructures, vol. 38, no. 3, pp. 184 – 200, 2005. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0749603605000716
- <sup>33</sup>Y. Wu et al., "A three-terminal spin-wave device for logic applications," Journal of Nanoelectronics and Optoelectronics, vol. 4, no. 3, pp. 394– 397, December 2009.
- <sup>34</sup>A. Khitun et al., "Feasibility study of logic circuits with a spin wave bus," Nanotechnology, vol. 18, no. 46, p. 465202, 2007. [Online]. Available: http://stacks.iop.org/0957- 4484/18/i=46/a=465202
- <sup>35</sup>A. Khitun et al., "Spin wave logic circuit on silicon platform," in Fifth International Conference on Information Technology: New Generations (itng 2008), April 2008, pp. 1107–1110.
- <sup>36</sup>B. Rana and Y. Otani, "Voltage-controlled reconfigurable spin-wave nanochannels and logic devices," Phys. Rev. Applied, vol. 9, p. 014033, Jan 2018. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevApplied.9.014033
- <sup>37</sup>A. Chumak, A. Serga, and B. Hillebrands, "Magnon transistor for all-magnon data processing", Nat Commun 5, 4700 (2014). https://doi.org/10.1038/ncomms5700
- <sup>38</sup>A. Khitun and K. L. Wang, "Non-volatile magnonic logic circuits engineering," Journal of Applied Physics, vol. 110, no. 3, p. 034306, 2011. [Online]. Available: https://doi.org/10.1063/1.3609062
- <sup>39</sup>S. Klingler, P. Pirro, T. Brächer, B. Leven, B. Hillebrands, and A. V. Chumak, "Design of a spin-wave majority gate employing mode selection," Applied Physics Letters, vol. 105, no. 15, p. 152410, 2014. [Online]. Available: https://doi.org/10.1063/1.4898042
- <sup>40</sup>S. Klingler, P. Pirro, T. Brächer, B. Leven, B. Hillebrands, and A. V. Chumak, "Spin-wave logic devices based on isotropic forward volume magnetostatic waves," Applied Physics Letters, vol. 106, no. 21, p. 212406, 2015.
- <sup>41</sup>O. Zografos, S. Dutta, M. Manfrini, A. Vaysset, B. Sorée, A. Naeemi, P. Raghavan, R. Lauwereins, and I. P. Radu, "Non-volatile spin wave majority gate at the nanoscale," AIP Advances, vol. 7, no. 5, p. 056020, 2017. [Online]. Available: https://doi.org/10.1063/1.4975693
- <sup>42</sup>K. Nanayakkara, A. Anferov, A. P. Jacob, S. J. Allen, and A. Kozhanov, "Cross junction spin wave logic architecture," IEEE Transactions on Magnetics, vol. 50, no. 11, pp. 1–4, Nov 2014.
- <sup>43</sup>T. Fischer, M. Kewenig, D. A. Bozhko, A. A. Serga, I. I. Syvorotka, F. Ciubotaru, C. Adelmann, B. Hillebrands, and A. V. Chumak, "Experimental prototype of a spin-wave majority gate," Applied Physics Letters, vol. 110, no. 15, p. 152401, 2017. [Online]. Available: https://doi.org/10.1063/1.4979840
- <sup>44</sup>P. Shabadi, A. Khitun, P. Narayanan, M. Bao, I. Koren, K. L. Wang, and C. A. Moritz, "Towards logic functions as the device," in 2010 IEEE/ACM International Symposium on Nanoscale Architectures, June 2010, pp. 11–16.
- <sup>45</sup>T. Fischer et al., "Experimental prototype of a spin-wave majority gate," Applied Physics Letters, vol. 110, no. 15, p. 152401, 2017. [Online]. Available:

https://doi.org/10.1063/1.4979840

- <sup>46</sup>F. Ciubotaru et al., "First experimental demonstration of a scalable linear majority gate based on spin waves," in 2018 IEEE International Electron Devices Meeting (IEDM), Dec 2018, pp. 36.1.1–36.1.4.
- <sup>47</sup>A. Khitun, "Multi-frequency magnonic logic circuits for parallel data processing," Journal of Applied Physics, vol. 111, no. 5, p. 054307, 2012. [Online]. Available: https://doi.org/10.1063/1.3689011
- <sup>48</sup>A. V. Chumak, A. A. Serga, and B. Hillebrands, "Magnonic crystals for data processing," Journal of Physics D: Applied Physics, vol. 50, no. 24, p. 244001, 2017. [Online]. Available: http://stacks.iop.org/0022- 3727/50/i=24/a=244001
- <sup>49</sup>L. Landau and E. Lifshitz., "On the theory of the dispersion of magnetic permeability in ferromagnetic bodies," Phys. Z. Sowjetunion, pp. 101– 114, 1935.
- <sup>50</sup>T. L. Gilbert, "A phenomenological theory of damping in ferromagnetic materials," IEEE Transactions on Magnetics, vol. 40, no. 6, pp. 3443– 3449, Nov 2004.
- <sup>51</sup>A. Mahmoud, F. Vanderveken, F. Ciubotaru, C. Adelmann, S. Cotofana, and S. Hamdioui, "n-bit data parallel spin wave logic gate," in 2020 Design, Automation Test in Europe Conference Exhibition (DATE), 2020, pp. 642–645
- <sup>52</sup>T. Devolder et al., "Time-resolved spin-torque switching in mgo-based perpendicularly magnetized tunnel junctions," Phys. Rev. B, vol. 93, p. 024420, Jan 2016. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevB.93.024420
- <sup>53</sup>O. Zografos et al., "Design and benchmarking of hybrid cmos-spin wave device circuits compared to 10nm cmos," in 2015 IEEE 15th International Conference on Nanotechnology (IEEE-NANO), July 2015, pp. 686–689.
- <sup>54</sup>Q. Wang et al., "Reconfigurable nanoscale spin-wave directional coupler," Science Advances, vol. 4, no. 1, 2018. [Online]. Available: https://advances.sciencemag.org/content/4/1/e1701517
- <sup>55</sup>Q. Wang, B. Heinz, R. Verba, M. Kewenig, P. Pirro, M. Schneider, T. Meyer, B. Lägel, C. Dubs, T. Brächer, and A. V. Chumak, "Spin pinning and spin-wave dispersion in nanoscopic ferromagnetic waveguides," Phys. Rev. Lett., vol. 122, p. 247202, Jun 2019. [Online]. Available: https://link.aps.org/doi/10.1103/PhysRevLett.122.247202