# Defect and Fault Modeling Framework for STT-MRAM Testing Wu, Lizhou; Rao, Siddharth; Taouil, Mottaqiallah; Cardoso Medeiros, Guilherme; Fieback, Moritz; Marinissen, Erik Jan; Kar, Gouri Sankar; Hamdioui, Said 10.1109/TETC.2019.2960375 **Publication date** 2019 **Document Version** Final published version Published in IEEE Transactions on Emerging Topics in Computing Citation (APA) Wu, L., Rao, S., Taouil, M., Cardoso Medeiros, G., Fieback, M., Marinissen, E. J., Kar, G. S., & Hamdioui, S. (2019). Defect and Fault Modeling Framework for STT-MRAM Testing. *IEEE Transactions on Emerging Topics in Computing*, *9*(2), 707-723. Article 8935208. Advance online publication. https://doi.org/10.1109/TETC.2019.2960375 # Important note To cite this publication, please use the final published version (if applicable). Please check the document version above. Copyright Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons. Please contact us and provide details if you believe this document breaches copyrights. We will remove access to the work immediately and investigate your claim. Received 27 June 2019; revised 6 December 2019; accepted 13 December 2019. Date of publication 17 December 2019; date of current version 4 June 2021. Digital Object Identifier 10.1109/TETC.2019.2960375 # Defect and Fault Modeling Framework for STT-MRAM Testing LIZHOU WU<sup>®</sup>, (Student Member, IEEE), SIDDHARTH RAO, MOTTAQIALLAH TAOUIL<sup>®</sup>, (Member, IEEE), GUILHERME CARDOSO MEDEIROS, (Student Member, IEEE), MORITZ FIEBACK<sup>®</sup>, (Student Member, IEEE), ERIK JAN MARINISSEN<sup>®</sup>, (Fellow, IEEE), GOURI SANKAR KAR, AND SAID HAMDIOUI<sup>®</sup>, (Senior Member, IEEE) L. Wu, M. Taouil, G.C. Medeiros, M. Fieback, and S. Hamdioui are with the Delft University of Technology, CognitiveIC, Delft 2628CD, Netherlands S. Rao, E.J. Marinissen, and G.S. Kar are with IMEC, Leuven 3001, Belgium CORRESPONDING AUTHOR: L. WU (Lizhou.Wu@tudelft.nl). ABSTRACT STT-MRAM mass production is around the corner as major foundries worldwide invest heavily on its commercialization. To ensure high-quality STT-MRAM products, effective yet cost-efficient test solutions are of great importance. This article presents a systematic device-aware defect and fault modeling framework for STT-MRAM to derive accurate fault models which reflect the physical defects appropriately, and thereafter optimal and high-quality test solutions. An overview and classification of manufacturing defects in STT-MRAMs are provided with an emphasis on those related to the fabrication of magnetic tunnel junction (MTJ) devices, i.e., the data-storing elements. Defects in MTJ devices need to be modeled by adjusting the affected technology parameters and subsequent electrical parameters to fully capture the defect impact on both the device's electrical and magnetic properties, whereas defects in interconnects can be modeled as linear resistors. In addition, a complete single-cell fault space and nomenclature are defined, and a systematic fault analysis methodology is proposed. To demonstrate the use of the proposed framework, resistive defects in interconnect and pinhole defects in MTJ devices are analyzed for a single 1T-1MTJ memory cell. Test solutions for detecting these defects are also discussed. **INDEX TERMS** STT-MRAM, manufacturing defects, fault models, test development #### I. INTRODUCTION Technology downscaling has driven a great success of the semiconductor industry in delivering faster, cheaper, and denser charge-based memories such as SRAM, DRAM, and Flash. However, as these existing memory technologies approach their scaling limits, they become increasingly power hungry and less reliable while the fabrication is more expensive due to the increased manufacturing complexity [1]. As alternative solutions, several promising non-volatile memory (NVM) technologies have emerged and attracted extensive R&D attention for various levels in the memory hierarchy [2]. Among them, spin-transfer torque magnetic random access memory (STT-MRAM) features high density, nearly unlimited endurance, negligible leakage power, and CMOS compatibility [3]. The tunability of write performance, endurance, and data retention makes STT-MRAM customizable for a variety of applications such as last-level cache, Internet-of-Things, and automotive. According to a report from Coughlin Associates after the 2018 MRAM Developer Day, it was projected that the market for MRAM solutions will experience a fast growth from \$36 million in 2017 to about \$3.3 billion in 2028, and the annual shipped capacity will rise to 84PB by 2028 [4]. Due to the promise of STT-MRAM and the growing market, many companies worldwide have been heavily investing in the commercialization of STT-MRAMs. For example, Everspin Technology announced the first STT-MRAM chip of 64Mb in 2012 [5]. Intel and Samsung also demonstrated their embedded STT-MRAMs in 2018 [6], [7]. To ensure high-quality STT-MRAM products being shipped to customers, effective yet cost-efficient test solutions are imperative. Testing STT-MRAMs is still an emerging research topic. Azevedo *et al.* [8], [9] injected resistive shorts and opens into a SPICE model of an MRAM cell and subsequently performed simulations to derive fault models. Su *et al.* [10] did intensive analysis of the excessive magnetic field during write operations and observed write disturbance faults; they validated FIGURE 1. Systematic defect and fault modeling framework. those using chip measurements. Chintaluri et al. [11], [12] have taken the fault modeling one step further by studying the impact of resistive defects while considering extreme process variations; they proposed a test algorithm and its built-in-selftest (BIST) implementation. Recently, Nair et al. [13] have reported detailed STT-MRAM fault analyses, based on injecting resistors into layout-aware netlist. Nevertheless, prior work has three major limitations. First, linear resistors are used to model all STT-MRAM manufacturing defects, including those in magnetic tunnel junction (MTJ) devices which are the data-storing elements in STT-MRAMs. However, linear resistors (with only electrical properties) cannot reflect the changes of defects on the MTJ's magnetic properties which are as important as electrical ones. Second, there is a lack of characterization data of defective STT-MRAM cells: this is needed to understand the mechanisms, causes, locations, and impact of STT-MRAM defects. Finally, existing fault modeling approaches are unsystematic, and the fault model terminology is ambiguous. For instance, Chintaluri et al. [11] refer to a failed transition write fault as transition fault (TF), while Vatajelu et al. [14] use the term slow write fault (SWF) to describe the same faulty behavior. In addition, the term read distrub fault (RDF) is used to describe different faulty behaviors with different failure mechanisms in [11] and [15]. In this paper, we present a systematic defect and fault modeling framework, as shown in Figure 1, to derive realistic fault models for STT-MRAM testing. We classify STT-MRAM defects into two categories: interconnect defects and MTJ defects. The former can be modeled as linear resistors with the conventional defect modeling method, while the later cannot as the defect-induced changes on magnetic properties of MTJ devices cannot be captured by electrical resistors. For MTJ defects, we incorporate their impact on the technology parameters of MTJ and thereafter on the device's electrical parameters. Furthermore, silicon measurement data of defective MTJ devices can be used to calibrate the defective MTJ model if applicable. By defining the complete fault space and using our fault analysis methodology, accurate fault models which reflect the physical defects can be validated within the fault space. Note that accurate fault modeling is a key enabler for high-quality and efficient test solutions, while inaccurate fault modeling may result in providing solutions for non-existing problems! In summary, the contributions of this paper are as follows. - An overview and classification of STT-MRAM manufacturing defects. - A device-aware defect modeling approach. - A complete STT-MRAM fault space and nomenclature; it provides all possible faults. (c) Energy barrier between P and AP states FIGURE 2. pMTJ device and its binary states. • Fault analysis for a) pinhole defects in MTJ devices using device-aware fault modeling approach, b) resistive defects in interconnects. 90° Angle between $m_{Fl}$ and $m_{Pl}(\theta)$ 180° Fault models and test solutions for detecting abovementioned defects. The rest of this paper is organized as follows. Section II provides a background on STT-MRAM technology. Section III presents an overview of STT-MRAM manufacturing process and defects. Section IV introduces the device-aware defect modeling approach. Section V presents the device-aware fault modeling methodology. Section VI demonstrates our approach on interconnect and pinhole defects in STT-MRAMs. Section VIII provides a brief discussion. Finally, Section IX concludes this paper. # II. BACKGROUND In this section, we introduce the organization of MTJ device and its working principles, followed by the most commonly-used 1T-1MTJ cell design for building STT-MRAM arrays. # A. MTJ DEVICE ORGANIZATION The *magnetic tunnel junction* is the core of STT-MRAM, as it is the data-storing element which contains one-bit of data in the form of binary magnetic configurations. The MTJ device is fundamentally composed of three layers [16], as shown with the schematic in Figure 2(a) and a cross-sectional transmission electron microscopy (TEM) image of a $\phi$ 55 mm MTJ device fabricated at IMEC in Figure 2(b) . 1) Free Layer (FL). The top layer is called free layer, which is typically made of CoFeB material ( $t_{\rm FL} = \sim 1.5~nm$ [17]). The magnetization ( $m_{\rm FL}$ ) in the FL is engineered towards the easy axis (an energetically favorable direction), and it can be switched to the opposite direction by applying a spin-polarized current flowing through the TABLE 1. STT-MRAM key parameters. | | Technology Parameters | Electrical Parameters | | | | |-------------|-------------------------------------|----------------------------------------------|---------------------------------|--|--| | $M_{ m s}$ | Saturation magnetization of the FL | $R_{ m P}$ | Resistance in P state | | | | $H_{ m k}$ | Magnetic anisotropy field of the FL | $R_{ m AP}$ | Resistance in AP state | | | | $ar{arphi}$ | Potential barrier height of the TB | $I_{c}(P{\rightarrow}AP)$ | P→AP critical switching current | | | | RA | Resistance-area product | $I_{c}(AP{\rightarrow}P)$ | AP→P critical switching current | | | | TMR | Tunneling magneto-resistance ratio | $t_{\rm w}({\rm P}{\rightarrow}{\rm AP})$ | P→AP switching time | | | | | | $t_{\mathrm{w}}(\mathrm{AP}{\to}\mathrm{P})$ | AP→P switching time | | | device. The saturation magnetization $M_{\rm s}$ and magnetic anisotropy field $H_{\rm k}$ are two key technology parameters determining the thermal stability $\Delta$ of the FL [16], as shown in Table 1. The easy axis lies in the thin film if the FL has in-plane magnetic anisotropy, whereas it points perpendicular to the free layer for perpendicular magnetic anisotropy (pMTJ). Since pMTJ devices offer higher scalability and less switching current, they are more favorable in the industry [18]. Accordingly, we will limit our focus to pMTJ devices in the remainder of this paper. - 2) Tunnel Barrier (TB). The MgO dielectric layer in the middle is called tunnel barrier. As the TB layer is ultrathin, typically $\sim 1$ nm [17], electrons have chance to tunnel through it overcoming its potential barrier height $\bar{\varphi}$ [19]. This makes the device behave as a tunneling-like resistor. To compare the sheet resistivity of different MTJ designs, the resistance-area (RA) product [16] is used. This is a figure of merit which is commonly used in MRAM community, and it is independent on device size. - 3) *Pinned Layer (PL)*. The bottom ferromagnetic layer is referred to as pinned layer; typically its thickness is $t_{\rm PL} = 2.5~nm$ [17]. The magnetization ( $m_{\rm PL}$ ) of the PL is strongly pinned to a certain direction by an inner synthetic anti-ferromagnet (iSAF) [17]. With the fixed magnetization in PL as a reference, the magnetization in FL is either *parallel* (P state) or *anti-parallel* (AP state) to that of PL. # **B. WORKING PRINCIPLES** To work properly as memory elements, MTJ devices need to provide read and write mechanisms, which are realized by tunneling magneto-resistance (TMR) effect and spin-transfer-torque (STT) effect, respectively. 1) TMR effect. Apart from the thickness of the MgO barrier, the resistance of MTJ device also depends on the relative direction of magnetization in FL and PL, i.e., P or AP state, shown in Figure 2(c). When the device is in P state, the resistance is relatively low. By contrast, the device's resistance is high in AP state. This phenomenon is well known as tunneling magneto-resistance effect [16], [20], which is characterized by the TMR ratio. It is defined by: $TMR = (R_{AP} - R_P)/R_P$ , where $R_{AP}$ and $R_P$ FIGURE 3. Write and read operations of 1T-1MTJ cell. are the resistances in AP and P states, respectively. Physically, the TMR ratio is determined by the spin polarization of the FL and RL [16], [21], i.e., $TMR = 2P_{\rm FL}P_{\rm PL}/(1-P_{\rm FL}P_{\rm PL})$ , where $P_{\rm FL}$ and $P_{\rm RL}$ are the spin polarization of the FL and RL, respectively. The higher the TMR ratio, the easier to distinguish between P and AP states during read operations. For commercially-feasible STT-MRAM products, a minimum TMR ratio of 150 percent is required [18]. 2) STT effect. To switch between AP and P states, a spinpolarized current is required to pass through the MTJ device, providing energy larger than the energy barrier $(E_{\rm B})$ between the two states. When the current reaches the FL, it exerts a torque on the magnetization. If the current is larger than the *critical switching current* $(I_c)$ , the magnetization in the FL may switch, depending on the pulse width, to the other direction. By definition, $I_c$ is the current to switch the device's state within infinitely long time and at zero temperature [16]. It is a key electrical parameter to characterize the switching capability by current. Due to the bias dependence of STT efficiency and stray fields [16], $I_c(P \rightarrow AP)$ can be significantly different from $I_c(AP \rightarrow P)$ in practice. In addition, the switching time $(t_w)$ [19] is another critical parameter, which is inversely correlated with the actual write current. In other words, the higher the write current over $I_c$ , the less time required for the magnetization in FL to flip. In practice, $t_w(P \rightarrow AP)$ can also differ from $t_w(AP \rightarrow P)$ depending on the write current magnitude and duration. # C. 1T-1MTJ BIT-CELL DESIGN The 1T-1MTJ bit-cell design is the most widely-adopted cell design, comprising an MTJ device connected serially with an access transistor [22], [23], as shown in Figure 3(a). The MTJ in this structure serves as a resistive storage element, while the access transistor, typically NMOS, is responsible for selective access. The NMOS gate is connected to a word line (WL), which determines whether a row is accessed or not. The other two terminals are connected to a bit line (BL) and a source line (SL), respectively. They control write and read operations on the internal MTJ device depending on the magnitude and polarity of voltage applied across them. Figures 3(b), 3(c), and 3(d) show the three basic operations: write '0', write '1', and read. During a write '0' operation, WL and BL are pulled up to $V_{\rm DD}$ and SL is grounded, thus leading FIGURE 4. General manufacturing process of STT-MRAM: (a) bottom-up processing flow of STT-MRAM cells, (b) vertical cross-section structure of STT-MRAM cells [27]. to a current $I_{\rm w0}$ flowing from BL to SL. In contrast, a write '1' operation requires an opposite current going through the MTJ device with WL and SL at $V_{\rm DD}$ , and BL grounded. In order to avoid write failures, write currents in both directions should be greater than the critical switching current $I_{\rm c}$ . However, the current during a write '1' operation $I_{\rm w1}$ is slightly smaller than that of a write '0' operation $I_{\rm w0}$ , due to the source degeneration of NMOS in write '1' operations [24], [25]. For read operations, a read voltage $V_{\rm read}$ is applied; it leads to a read current $I_{\rm rd}$ with the same direction as $I_{\rm w0}$ to sense the resistive state (AP or P) of MTJ. To avoid an inadvertent state change during read operations, known as read destructive fault [15], $I_{\rm rd}$ should be as small as possible; typically $I_{\rm rd} < 0.5I_{\rm c}$ for MTJs with a thermal stability $\Delta = 65$ [26]. However, a too low $I_{\rm rd}$ may lead to incorrect read faults [11]. In general, the current magnitude relations must satisfy: $I_{\rm rd} < I_{\rm c} < I_{\rm w1} < I_{\rm w0}$ . This is indicated by the widths of the red arrows in Figures 3(b), 3(c), and 3(d). A read operation requires a sense amplifier to determine the resistive state. The sense amplifier may be implemented using a current sensing scheme, where the read-out value is determined by comparing the current of the accessed cell $I_{\rm cell} = I_{\rm rd}$ ) with the current of a reference cell $I_{\rm ref}$ . The sensing result is logic '0' if $I_{\rm cell} < I_{\rm ref}$ ; otherwise, it outputs logic '1'. # **III. DEFECT SPACE AND CLASSIFICATION** A defect is a physical imperfection in manufactured chips (i.e., an unintended difference from the intended design) [28]. To guarantee a high-quality test solution and improve the manufacturing process itself so as to improve yield, understanding all potential defects is of great importance. The STT-MRAM manufacturing process mainly consists of the standard CMOS fabrication steps and the integration of MTJ devices into metal layers (e.g., between M4 and M5 layers [29], [30]). Figure 4(a) shows the bottom-up manufacturing flow and Figure 4(b) the vertical structure of STT-MRAM cells [27]. Based on the manufacturing phase, STT-MRAM defects TABLE 2. STT-MRAM defect classification. | FEOL | BEOL | | | | |------------------------------------------------------------------------------------|--------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------|--|--| | Transistor | Interconnect | MTJ Device | | | | Material impurity Crystal imperfection Pinholes in gate oxides Shifting of dopants | Open vias/contacts<br>Irregular shapes<br>Big bubbles<br>Small particles | Pinholes in TB<br>Extreme thickness variation of TB<br>MgO/CoFeB interface roughness<br>Atom inter-diffusion | | | | Patterning proximity etc. | etc. | Redepositions on MTJ sidewalls<br>Magnetic layer corrosion<br>Magnetic coupling<br>etc. | | | can be classified into front-end-of-line (FEOL) and back-endof-line (BEOL) defects. As MTJs are integrated into metal layers during BEOL processing, BEOL defects can be further categorized into interconnect defects and MTJ-related defects. All potential defects are listed in Table 2. Next, we will examine them in detail along with their corresponding processing steps, with a particular emphasis on those introduced during MTJ fabrication. #### A. FEOL DEFECTS The first step of the STT-MRAM manufacturing process is the FEOL process where transistors are fabricated on the wafer. In this phase, typical defects may occur such as semiconductor impurities, crystal imperfections, pinholes in gate oxides, and shifting of dopants [31]. These are the conventional defects which have been sufficiently studied and are generally modeled by resistive opens, shorts and bridges [32], [33], [34]. # B. BEOL DEFECTS After FEOL, M1-M4 metal layers are stacked on top of the transistors followed by a bottom electrode contact (BEC), as illustrated in the zoomed-in part of Figure 4(b). M1-M4 metalization does not differ from traditional CMOS BEOL steps. The BEC step is used to connect bottom Cu lines with MTJ stacks [17], [27]. During this phase, typical interconnect defects may take place, such as open vias/contacts, irregular shapes, big bubbles, etc. [32]. For instance, Figure 5(a) shows a TEM image of an open contact defect between the BEC and the underlying Cu line due to polymer leftovers [27]. To obtain a super-smooth interface between the BEC and the MTJ stack, a chemical mechanical polishing (CMP) step is required. The smoothness of the interface between layers is key to obtaining a good *TMR* value. CMP processing minimizes the surface roughness with a root-mean-square average of 2Å [29]. At this stage, both under-polishing and overpolishing of the surface can introduce defects. Specifically, under-polishing causes issues such as orange peel coupling or offset fields which affect the hysteresis curve, while overpolishing may result in dishing or residual slurry particles that are left behind [14]. After the CMP step, the next critical step is the fabrication of the MTJ stack. The latest published MTJ design includes more than 10 layers for performance reasons [35]. However, the increasingly sophisticated design of the MTJ also makes FIGURE 5. TEM images of manufacturing defects: (a) an open contact defect between the BEC and the underlying Cu layer (reprinted from [27]), and (b) a pinhole defect in the MgO tunnel barrier of the MTJ device (reprinted from [36]). it more vulnerable to manufacturing defects. For example, pinholes in the tunneling barrier (e.g., MgO) could be introduced in this phase [36]. Figure 5(b) shows a TEM image of a deposited MTJ stack with a small pinhole in its MgO barrier. A pinhole filled with CoFeB material forms a defective highconductance path across the two ferromagnetic layers. It severely degrades the resistance and TMR values, and may even lead to breakdown due to the ohmic heating when an electric current passes through the barrier [37], [38]. Furthermore, the MgO barrier thickness variation and interface roughness result in degradation of resistance and TMR values as well. TEM images in [36] show that the MgO barrier thickness varies from 0.86 nm to 1.07 nm, leading to a huge difference in resistance. In [17], a TMR degradation was observed due to increased surface roughness caused by a complicated inner synthetic anti-ferromagnetic pinned layer design. Following the MTJ stack deposition, annealing is applied to obtain crystallization in MgO tunneling barrier as well as in the CoFeB PL and FL layers [39], [40]. At this stage, the perpendicular magnetic anisotropy originating from the MgO/CoFeB interface and *TMR* value are strongly determined by the annealing conditions such as temperature, magnetic field, and annealing time [39]. With appropriate annealing conditions, the PMA can be considerably enhanced, leading to higher thermal stability [40]. Under-annealing can lead to lattice mismatch between the body-centered cubic CoFeB lattice and the face-centered cubic MgO lattice, whereas overannealing introduces atom inter-diffusion between layers. For example, oxygen atoms can diffuse out of the MgO layer to the spacer layers, leaving behind oxygen vacancies, thus severely degrading the *TMR* value [41]. After MTJ multi-layer deposition and annealing, the next crucial step is to pattern individual MTJ nanopillars [42]. Typically, ion beam etching (IBE) is widely used to pattern MTJ nanopillars [43], [44]. During the MTJ etching process, it is extremely difficult to obtain MTJ nanopillars with steep sidewall edges, while avoiding sidewall redeposition and magnetic layer corrosion [36]. The redeposition phenomenon on sidewalls may significantly deteriorate the electrical properties of the MTJ device and even cause a barrier-short defect. In order to mitigate the redeposition effect, a side-etching step combined with the Halogen-based reactive ion etching (RIE) and inductively-coupled plasma (ICP) techniques [45], [46] is needed and done by rotating and tilting the wafer. Nevertheless, other concerns arise. For instance, the shadowing effect (limited etching coverage at the lower corner of the MTJ profile due to insufficient spacing between MTJs) [36], [43] limits a high-density array patterning, and magnetic layer corrosion degrades the reliability of MTJ devices due to the non-volatile chemicals attached to the CoFeB layers. Another critical issue is magnetic coupling effect [47] between different ferromagnetic layers after the MTJ nanopillars are patterned. Many prior works [6], [47], [48], [49] show that stray fields at the FL from underlying ferromagnets have a significant impact on the switching characteristics and retention time of MTJ devices. After the MTJ etching process, encapsulation and CMP are required to separate individual MTJ pillars. In this step, an oxygen showering post-treatment (OSP) can be applied to recover patterning damage so as to improve the electrical and magnetic properties of MTJ devices [50]. The oxygen showering process selectively oxidizes the perimeter (damaged by previous ion beam etching) of the MTJ pillar with non-reactive oxygen ions. However, over-oxidization into the MTJ device also causes degradation in key device parameters such as *TMR*. Thus, the OSP condition needs to be carefully tuned to maximize the damage suppression while protecting the inner undamaged parts. Next, MTJ pillars are connected to the top electrode contact (TEC), followed by M5 metallization. The rest of manufacturing process is the same as the BEOL steps of CMOS technology. Typical defects such open contact/vias, small particles etc. can occur in this phase as well. It is worth-noting that a package-level magnetic shield can be added to enhance the stand-by magnetic immunity of STT-MRAMs, as proposed in [51]. The magnetic shield was reported to be effective in protecting STT-MRAMs against external magnetic fields. # IV. DEVICE-AWARE DEFECT MODELING Defect modeling is the first critical step in the test development process. Having an accurate defect model that is able to mimic the way the physical defect manifests itself at the electrical level is the best way to close the gap between the reality and the abstraction (fault models). Next, we will discuss the defect models for interconnects/contacts and thereafter for MTJ devices. # A. MODELING OF DEFECTS IN INTERCONNECTS AND CONTACTS Traditionally, a spot defect in an electronic circuit is modeled as a linear resistor, and the defect strength is represented by its resistance value [12], [13], [52]. For instance, missing material is modeled as a disconnection, while extra material is modeled as an undesired connection. These undesired connections and disconnections can be typically classified into three groups as follows. [52], [53]. • Open: An undesired extra resistor $(R_{\rm op})$ within a connection; $0\Omega < R_{\rm op} \leq \infty \Omega$ . FIGURE 6. Resistive defects in a single 1T-1MTJ memory cell. - Short: An undesired resistive path $(R_{\rm sh})$ between a node and power supply (either $V_{\rm DD}$ or GND); $0\Omega \le R_{\rm sh} < \infty \Omega$ . - Bridge: A parallel resistor ( $R_{\rm br}$ ) between two connections; $0\Omega \le R_{\rm br} < \infty \Omega$ . Figure 6 illustrates how the above models are used to model some defects in interconnects and contacts of a single-cell STT-MRAM. For instance, OC<sub>m</sub> denotes an open between the NMOS selector and the MTJ device; it can be used to model the missing material defect on the contact shown in Figure 5 (a)). $BC_{BL-IN}$ denotes a bridge bypassing the MTJ device; it can be used to model the extra material redeposited on the MTJ sidewalls. Theoretically, there are four opens, six bridges, and eight shorts within a single STT-MRAM cell. Outside the memory cells, resistive defects can also occur in/between the WL, BL, and SL. For instance, OB<sub>w</sub> denotes an open in the bit line disconnecting the memory cell with the write driver, while OB<sub>r</sub> denotes an open in the bit line disconnecting the memory cell with the sense amplifier. It is worth noting that some resistive defects are not realistic when considering the physical layout of the design, as also emphasized in [13]. For example, shorts connecting the inner node (between the MTJ and NMOS) to $V_{\rm DD}$ or GND and bridges between the BL and WL are not possible, since they reside in different metal layers which are far away from each other [13]. # B. MODELING OF DEFECTS IN MTJ DEVICES The qualification of linear resistors in modeling defects in MTJ devices is in doubt, since linear resistors *cannot* reflect the defect-induced changes in magnetic properties which are as important as electrical ones for MTJ devices. In [54] we demonstrated that using linear resistors to model manufacturing defects in MTJ devices is inaccurate; this is justified by measurement data of defective MTJ devices. Inappropriate defect modeling may result in poor fault models which do not capture the defect behavior, leading to poor-quality test solutions. FIGURE 7. Generic defect modeling flow. Furthermore, tests targeting non-existing faults in reality waste test time and resources. # DEVICE-AWARE DEFECT MODELING METHODOLOGY To accurately model the defects in MTJ devices, we propose a three-step *device-aware defect modeling* methodology as shown in Figure 7. The philosophy of this approach is to incorporate the impact of physical defects on the technology parameters of the MTJ device and thereafter on its electrical parameters. The modeling flow starts with two inputs. The first one is the defect-free MTJ compact model (which can be calibrated by silicon data if available) of good MTJ devices [54]. The second one is the defective device under investigation (e.g., a device with a pinhole defect shown in Figure 5 (b)). The aim is to obtain an optimized defective MTJ compact model corresponding to the defective device by going through three steps as follows. 1) Physical defect analysis and modeling. Given a set of physical defects $\mathbf{D} = \{d_1, d_2, \dots, d_n\}$ that may occur during MTJ fabrication, each defect $d_i$ has to be physically analyzed and modeled. The effect of defect $d_i$ can be reflected by a change of the key MTJ-related technology parameters: $M_s$ , $H_k$ , $\bar{\varphi}$ , RA, and TMR (see Table 1). This results in *effective technology* parameters that can be denoted as $$M_{\text{s\_eff},i}(S_i) = f_i(M_{\text{s\_df}}, S_i)$$ (1) $$H_{\mathbf{k}_{-}\mathrm{eff},i}(\mathbf{S}_{i}) = g_{i}(H_{\mathbf{k}_{-}\mathrm{df}}, \mathbf{S}_{i}) \tag{2}$$ $$\bar{\varphi}_{\text{eff},i}(S_i) = r_i(\bar{\varphi}_{\text{df}}, S_i)$$ (3) $$RA_{\text{eff},i}(S_i) = k_i(RA_{\text{df}}, S_i)$$ (4) $$TMR_{\text{eff},i}(S_i) = h_i(TMR_{\text{df}}, S_i),$$ (5) where $f_i$ , $g_i$ , $r_i$ , $k_i$ , and $h_i$ are mapping functions corresponding to defect $d_i$ ( $i \in [1, n]$ ). $M_{s\_df}$ , $H_{k\_df}$ , $\bar{\varphi}_{df}$ , $RA_{df}$ , and $TMR_{df}$ , are the defect-free technology parameters. $S_i = \{x_1, x_2, \ldots, x_t\}$ is a set of parameters representing the size or strength of defect $d_i$ . It is worth noting that each defect may impact one or more technology parameters. - 2) Electrical modeling of the defective MTJ device. In this step, the impact of the updated technology parameters from Step 1 on the electrical parameters is identified; it reflects the way such defect d<sub>i</sub> influences the electrical parameters of the MTJ device. This can be done for example by updating the electrical parameters (see Table 1) of the defect-free MTJ model (e.g., the Verilog-A MTJ compact model calibrated with measurement data in [54]). Note that the electrical parameters are the ones needed for accurate circuit simulation for fault modeling. This step enables us to obtain a raw defective MTJ model. - 3) Fitting and model optimization. To validate the effectiveness of the defective MTJ model, it is suitable to fit the defective model to measurement data of real defective MTJ devices. If the behavior of the defective model (either its physical or electrical parameters) does not match the characterization data, the fitting parameter adjustment is necessary until an acceptable accuracy is obtained. Finally, we derive an optimized defect-parameterized compact model for defective MTJ devices. # 2) CASE STUDY ON PINHOLE DEFECTS We will illustrate the device-aware defect modeling methodology by applying it to a specific MTJ defect. We select the pinhole defect (introduced in Section III-B) for our case study, as this type of MTJ defects is considered as as one of the most important manufacturing defects in STT-MRAMs. Zhao et al.[36], [37], [55]. The pinhole defect has some unique signatures observed in electrical and magnetic characterization as follows [37], [54]. - The switching field in the R-H loop does not decrease compared to defect-free devices. This indicates that the defect resides in the MTJ's tunnel barrier while the FL remains intact. - The switching voltage in the R-V loop decreases significantly compared to defect-free devices. - The resistance of MTJ devices with pinhole defects drops very fast under pulse stress, caused by the growth of pinholes in the MgO barrier due to localized Joule heating by current flowing through the pinholes [38]. Next, the three steps of device-aware defect modeling applied to pinhole defects are explained as follows. 1) Physical defect analysis and modeling. RA and TMR are the two key technology parameters that are significantly impacted in the presence of a pinhole defect [36], [54]. Thus, we model the effect of a pinhole on these two technology parameters as follows [19]. $$RA_{\text{eff\_ph}}(A_{\text{ph}}) = \frac{A}{\frac{A(1-A_{\text{ph}})}{RA_{\text{nf}}} + \frac{A \cdot A_{\text{ph}}}{RA_{\text{bd}}}}$$ (6) $$TMR_{\mathrm{eff\_ph}}(A_{\mathrm{ph}}) = TMR_{\mathrm{df}} \cdot \frac{RA_{\mathrm{eff\_ph}}(A_{\mathrm{ph}}) - RA_{\mathrm{bd}}}{RA_{\mathrm{df}} - RA_{\mathrm{bd}}},$$ (7) FIGURE 8. Spectre simulation results vs. measurement data. where $A_{\rm ph} \in [0,1]$ is the normalized pinhole area with respect to the cross-sectional area A of the MTJ device. $RA_{\rm df}$ and $TMR_{\rm df}$ are RA and TMR parameters of a defect-free MTJ (i.e., when $A_{\rm ph}=0$ ), respectively. $RA_{\rm bd}$ is the resultant RA after breakdown. For our case study, we take $A=2827.4~nm^2$ , $RA_{\rm df}=4.52\Omega \cdot \mu m^2$ , and $TMR_{\rm df}=139\%$ ; these values were reported based on measuring defect-free MTJ devices in [54]. Note that the location of the pinhole defect has negligible effects on the electron transportation in the two-terminal MTJ device, as electrons either tunnel through the pinhole area or the undamaged parts [37], [56]. Apart from the pinhole location, its shape also plays little role as the MgO layer is ultra-thin, typically $\sim 1~nm$ which is equivalent to a few atoms in thickness. - 2) Electrical modeling of the defective MTJ device. Next, we integrate Equations (6) and (7) into our calibrated defect-free MTJ compact model (presented in [54]). In this way, we convert the defect-free MTJ model into a defective-MTJ model which is able to mimic the electrical impact of a pinhole defect on the MTJ device. Furthermore, the pinhole size is tunable by changing the input argument *A*<sub>ph</sub>. - 3) Fitting and model optimization. In this step, we use the measurement data of MTJ devices with pinhole defects to better calibrate our model. By fitting to the measured silicon data, we can further optimize our pinhole-parameterized MTJ compact model. To this end, we performed comprehensive electrical and magnetic characterizations of defective MTJs with pinhole defects at both t=0 and t>0 (i.e., stress test). By constantly stressing the devices with a small pinhole while tracking its RA and TMR values, we obtained $RA_{\rm bd}=0.41\Omega \cdot \mu {\rm m}^2$ after extrapolating the fitting curve to the point where TMR = 0 [54]. Figure 8 shows the Spectre simulation results (solid curves) of R-V hysteresis loops with various $A_{\rm ph}$ values. It can be seen that the simulation results with our proposed defective MTJ model match the measured silicon data in terms of resistance and switching voltage. Note that our simulation results represent the green R-V loop with an injection of pinhole defects. However, the other three measured R-V hysteresis loops belong to three distinct defective devices, which may have different $RA_{\rm df}$ and $TMR_{\rm df}$ due to process variation. Based on the proposed defective MTJ model, accurate fault modeling of pinhole defects and subsequent test development can be performed. #### V. DEVICE-AWARE FAULT MODELING In order to obtain appropriate fault models, the defect models that can be generated on the approach discussed in the previous section should be used to analyze the behavior of a memory in the presence of defects. The results from this analysis are used to develop a high-quality test. Fault modeling process consists of two steps: 1) *fault space* that describes *all possible* faults and a classification of them; 2) fault analysis methodology that determines which faults from the fault space are *realistic* for the defect under consideration, i.e., which faults are sensitized in the presence of such a defect. These steps will be explained next. # A. FAULT SPACE AND CLASSIFICATION In this work, we limit the analysis to single-cell faults [57]. If only one cell is involved, the fault is called single-cell fault. If multiple cells are involved, the fault is a multi-cell fault, which is out of the scope of this paper. Memory faults can be systematically described by *fault primitives* (FPs) [57]. An FP describes the deviation of the observed memory behavior from the expected. The FP notation is denoted as a three-tuple $\langle S/F/R \rangle$ , which is explained as follows. - 1) S (sensitizing sequence) denotes an operation sequence that sensitizes a fault. It takes the form of $S = x_0 O_1 x_1 \dots O_n x_n$ , where $x_i \in \{0,1\}$ ( $i \in \{0,1,\dots,n\}$ ) and $O \in \{r,w\}$ . Here, '0' and '1' denote the logic values of memory cells, while 'r' and 'w' denote a reading and a writing operation, respectively. n is the number of operations involved in the sensitizing sequence. For example, S = 0 means the addressed cell is initialized to logic '0' state and no write/read operations are applied, while S = 1 w0 r0 means that the addressed cell is initialized to '1' state followed by write '0' and read '0' operations. - 2) F (faulty effect) describes the value that is stored in the cell after S is performed. For traditional charge-based memories, e.g., SRAM, there exists only two digital states, i.e., $F \in \{0, 1\}$ . However, data in STT-MRAM cells is stored in MTJ devices whose pre-defined resistance ranges determine the logic states '0' and '1'. Due to defects or extreme process variations, the MTJ resistance can be outside these ranges. Hence, it is necessary to define other (faulty) resistance states to cover defective MTJ devices. Figure 9 presents the measured resistance distribution of a large number of $\phi$ 60 mm MTJ devices; it shows that $F \in \{0, 1, U, L, H\}$ , as will be explained next. Each point in the figure represents a device whose $R_P$ is shown on the x-axis and $R_{AP}$ on the y-axis. From a design perspective, the nominal $R_P$ is 2 k $\Omega$ and the nominal $R_{\rm AP}$ is 5k $\Omega$ ; this assures a good read reliability with TMR = 150%. A $3\sigma$ variation of the nominal values is used to define the resistance ranges of the two state '0' and '1'. As shown in the figure, the FIGURE 9. Measured resistance distribution of $R_{\rm P}$ and $R_{\rm AP}$ for $\phi$ 60 nm MTJ devices, suggesting the existence of states 'L', '0', 'U', '1', and 'H'. points inside the shaded box represent good devices in accordance with the above design specifications. However, there are also a large number of devices outside the specification due to some defects or extreme process variations. These are: 1) extreme low resistance state 'L', (2) extreme high resistance state 'H', and (3) undefined state 'U'. 3) R (readout value) describes the output of a read operation if the last operation in S is a read operation. Here, $R \in \{0,1,?,-\}$ . '?' denotes a random readout value in case the sensing current is very close to sense amplifier's reference current (e.g., the cell under read is in a 'U' state). '—' denotes that R is not applicable, i.e., when the last operation in S is not a read operation. Note that a read operation on a cell in 'L' state returns a logic '0' while the 'H' state returns a logic '1'. Depending on the number of operations involved in the sensitizing operation S, FPs can be classified into *static* and *dynamic faults* [58]. A static fault is a fault which can be sensitized by at most one operation (i.e., $n \le 1$ ), while a dynamic fault requires more than one operations (i.e., n > 1) to be sensitized. The FP names comply with the following format: $$FP = \begin{cases} S\{ini\}F\{fin\}, & n = 0\\ [out]\{opn\}\{opd\}\{eff\}F\{fin\}, & n = 1\\ \{nd-\}[out]\{opn\}\{opd\}\{eff\}F\{fin\}, & n > 1 \end{cases}$$ If no read/write operation is involved in S (i.e., n = 0), the FP name complies with the format: $S\{ini\}F\{fin\}$ , where - *ini* describes the initial state of the faulty cell; $ini \in \{0, 1\}$ . - fin describes the final state of the faulty cell; fin $\in \{L, 0, U, 1, H\}$ . For example, fault primitive S1FU= $\langle 1/U/-\rangle$ means a *state* fault with initialized state 1, but it ends up in U state due to the existence of a defect. TABLE 3. Complete single-cell static fault primitives. | # | S | F | R | Notation | Name | # | S | F | R | Notation | Name | |----|-----|---|---|------------------------------------------------------|--------|----|-----|---|---|---------------------------|--------| | 1 | 0 | 1 | - | (0/1/-) | S0F1 | 27 | 0r0 | 1 | 0 | (0r0/1/0) | dR0DF1 | | 2 | 0 | L | - | (0/L/-) | S0FL | 28 | 0r0 | 1 | ? | (0r0/1/?) | rR0DF1 | | 3 | 0 | U | - | (0/U/-) | S0FU | 29 | 0r0 | 1 | 1 | (0r0/1/1) | iR0DF1 | | 4 | 0 | Η | - | (0/H/-) | S0FH | 30 | 0r0 | L | 0 | $\langle 0r0/L/0 \rangle$ | dR0DFL | | 5 | 1 | 0 | - | (1/0/-) | S1F0 | 31 | 0r0 | L | ? | (0r0/L/?) | rR0DFL | | 6 | 1 | L | - | $\langle 1/L/- \rangle$ | S1FL | 32 | 0r0 | L | 1 | $\langle 0r0/L/1 \rangle$ | iR0DFL | | 7 | 1 | U | - | (1/U/-) | S1FU | 33 | 0r0 | U | 0 | $\langle 0r0/U/0 \rangle$ | dR0DFU | | 8 | 1 | Η | - | (1/H/-) | S1FH | 34 | 0r0 | U | ? | $\langle 0r0/U/? \rangle$ | rR0DFU | | 9 | 0w1 | 0 | - | $\langle 0w1/0/-\rangle$ | W1TF0 | 35 | 0r0 | U | 1 | $\langle 0r0/U/1 \rangle$ | iR0DFU | | 10 | 0w1 | L | - | $\langle 0w1/L/-\rangle$ | W1TFL | 36 | 0r0 | Η | 0 | $\langle 0r0/H/0 \rangle$ | dR0DFH | | 11 | 0w1 | U | - | $\langle 0w1/U/- \rangle$ | W1TFU | 37 | 0r0 | Η | ? | $\langle 0r0/H/? \rangle$ | rR0DFH | | 12 | 0w1 | Η | - | (0w1/H/-) | W1TFH | 38 | 0r0 | Η | 1 | (0r0/H/1) | iR0DFH | | 13 | 1w0 | 1 | - | (1w0/1/-) | W0TF1 | 39 | 1r1 | 0 | 0 | $\langle 1r1/0/0 \rangle$ | iR1DF0 | | 14 | 1w0 | L | - | $\langle 1 \text{w} 0/\text{L}/\text{-} \rangle$ | W0TFL | 40 | 1r1 | 0 | ? | ⟨1r1/0/?⟩ | rR1DF0 | | 15 | 1w0 | U | - | $\langle 1 \text{w} 0 / \text{U} / \text{-} \rangle$ | W0TFU | 41 | 1r1 | 0 | 1 | (1r1/0/1) | dR1DF0 | | 16 | 1w0 | Η | - | $\langle 1 \text{w} 0 / \text{H} / \text{-} \rangle$ | W0TFH | 42 | 1r1 | 1 | 0 | $\langle 1r1/1/0 \rangle$ | iR1NF1 | | 17 | 0w0 | 1 | - | (0w0/1/-) | W0DF1 | 43 | 1r1 | 1 | ? | ⟨1r1/1/?⟩ | rR1NF1 | | 18 | 0w0 | L | - | $\langle 0 \text{w} 0 \text{/L/-} \rangle$ | W0DFL | 44 | 1r1 | L | 0 | $\langle 1r1/L/0 \rangle$ | iR1DFL | | 19 | 0w0 | U | - | $\langle 0 \text{w} 0 / \text{U} / \text{-} \rangle$ | W0DFU | 45 | 1r1 | L | ? | ⟨1r1/L/?⟩ | rR1DFL | | 20 | 0w0 | Η | - | $\langle 0 \text{w} 0 / \text{H} / \text{-} \rangle$ | W0DFH | 46 | 1r1 | L | 1 | $\langle 1r1/L/1 \rangle$ | dR1DFL | | 21 | 1w1 | 0 | - | $\langle 1w1/0/-\rangle$ | W1DF0 | 47 | 1r1 | U | 0 | $\langle 1r1/U/0 \rangle$ | iR1DFU | | 22 | 1w1 | L | - | $\langle 1w1/L/-\rangle$ | W1DFL | 48 | 1r1 | U | ? | ⟨1r1/U/?⟩ | rR1DFU | | 23 | 1w1 | U | - | $\langle 1w1/U/-\rangle$ | W1DFU | 49 | 1r1 | U | 1 | (1r1/U/1) | dR1DFU | | 24 | 1w1 | Η | - | (1w1/H/-) | W1DFH | 50 | 1r1 | Η | 0 | (1r1/H/0) | iR1DFH | | 25 | 0r0 | 0 | ? | (0r0/0/?) | rR0NF0 | 51 | 1r1 | Н | ? | ⟨1r1/H/?⟩ | rR1DFH | | 26 | 0r0 | 0 | 1 | (0r0/0/1) | iR0NF0 | 52 | 1r1 | Н | 1 | $\langle 1r1/H/1 \rangle$ | dR1DFH | | | | | | | | | | | | | | If an FP involves only one sensitizing operation in S (i.e., n = 1), then its name complies with the format: $[out]\{opn\}\{opd\}\{eff\}F\{fin\}$ , where the fields in curly braces are required while the fields in square brackets are optional. Apart from the $\{fin\}$ field introduced previously, the remaining fields are explained as follows. - out describes the readout effect of the read operation in S if applicable; out ∈ {i, r, d}, where 'i' means an incorrect readout, 'r' a random readout, and 'd' a deceptive readout. Note that a deceptive readout implies that the read operation returns a correct value while making the final state fin different from the one before reading. The out field is omitted when there is no read operation in S. - opn describes the operation in S; opn ∈ {w, r}, where 'w' means a write operation while 'r' means a read operation. - *opd* describes the operand of the operation *opn*; $opd \in \{0, 1\}$ . - eff describes the operational effect on the faulty cell; eff ∈ {T, D, N}, where 'T' means a transition operation, 'D' a destructive operation, 'N' non-destructive operation. This field is omitted for read operations which do not change the resistive state of the cell. Table 3 lists all single-cell static FPs with their notations and names. For instance, W0TFH= $\langle 1\text{w0/H/}-\rangle$ represents a Write Transition Fault where a write '0' operation forces the addressed cell with the initial state '1' to state 'H'. rR1DFU= $\langle 1\text{r1/U/?}\rangle$ represents a random Read Destructive Fault where a read '1' operation forces the cell with initial state '1' to state 'U' and returns a random readout value. Similarly, other FPs in the table can be interpreted according the above FP nomenclature. FIGURE 10. Faut classification. It is worth noting that a *fault model* is an non-empty set of fault primitives with similar or complementary properties. For example, *State Fault* (SF) is a set of FPs from #1 to #8 in Table 3, whereas *Write Transition Fault* (WTF) includes FPs from #9 to #16. Similarly, one can also find the FPs belonging to *Write Destructive Fault* (WDF), *Read Non-destructive Fault* (RNF), and *Read Destructive Fault* (RDF) in the table. For dynamic faults which are sensitized by more than one operation (i.e., n > 1), their names get the prefix nd— where n denotes the number of operations in S. Note that the naming scheme follows the same rules of static FPs using the last operation and its preceding state in S, e.g., $\langle 1r1w0/L/-\rangle$ is named as 2d-W0TFL. As shown in Figure 10, memory faults can be classified into *strong faults* and *weak faults* depending on whether or not the fault can be described by fault primitives. Strong faults are faults that can always be sensitized by applying a sequence of operations and therefore can be described by fault primitives. Table 3 lists all static strong faults that may occur in a single memory cell. In contrast, weak faults *cannot* be described by fault primitives. However, they cause parametric changes in the circuits, e.g., a small reduction in the read current flowing through the cell under read. Although weak faults do not lead to any functional errors right after manufacture, they may cause severe reliability issues (e.g., shorter lifetime, higher infield failure rate). Therefore, weak faults need to be detected as well when the target market has a strict quality requirement. Depending on whether or not the fault is detectable by normal write or read operations, strong faults can be further divided into *easy-to-detect* (EtD) and *hard-to-detect* (HtD) faults. Although all strong faults can be sensitized by a sequence of operations S, their detection conditions may not necessarily be equal to S. EtD faults refer to those faults that can be easily detected by applying write and read operations (i.e., a March test [52]). *Write Destructive Fault* W1DFL= $\langle 1\text{w}1/\text{L}I-\rangle \rangle$ and *incorrect Read Non-destructive Fault* iR1NF1= $\langle 1\text{r}1/11/0\rangle \rangle$ are two examples of EtD faults. The detection condition for the former is $(\ldots, 1, \ldots, 1, \ldots, 1, \ldots)$ denotes that the detection condition is independent on the addressing direction; $(\ldots, 1, \omega_1, \tau_1, \ldots)$ denotes that the cell under test is initialized in logic '1', followed by a consecutive FIGURE 11. Fault analysis methodology. w1 and r1 operations, applied to each address before moving to the next address. Any March test meeting the above detection condition can guarantee the detection of the corresponding fault. In contrast, the detection of HtD faults *cannot* be guaranteed by just March tests; they require additional effort such as a special *Design-for-Testability* (DfT) circuit or a stress test in order to be detected. Note that strong faults consist of EtD and HtD faults, while weak faults are all HtD faults. Examples of strong HtD faults are *Write Transition Fault* W0TFU=\(\frac{1\text{w0/U/-}\rangle}{\text{andom Read Non-destructive Fault rR1NF1=\(\frac{1\text{r1/1/?}\rangle}{\text{.}}\). For these two faults, March tests cannot guarantee their detections since a read operation on the faulty cell returns a random value. # B. FAULT ANALYSIS METHODOLOGY Once STT-MRAM defects are modeled and the fault space is defined, the validation of the faults can be performed using a systematic circuit simulation approach. In this paper we restrict ourselves to single-cell fault analysis as only defects in a single 1T-1MTJ cell are considered in our simulations. Our fault analysis consists of seven steps: 1) circuit generation, 2) defect injection, 3) stimuli generation, 4) circuit simulation, 5) fault analysis, 6) fault primitives identification, and 7) defect strength sweeping and repetition of steps 2 to 6 until all defects and their sizes are covered. Note that in our simulations, defect injection means adding a specific resistor to the defect-free memory cell for interconnect defects (see Figure 6), but it means replacement of the defect-free MTJ model with the defective MTJ model for MTJ defects (see Figure 7). In addition, defect size sweeping means changing resistance for the resistor model while it means changing the pinhole area $A_{\rm ph}$ for a pinhole defect in MTJ devices. Each time only one specific defect (e.g., an open OC<sub>m</sub> or a pinhole PH) with certain size is analyzed in our simulations. Figure 11 shows the fault analysis methodology that illustrates how we validate faults in the defined fault space due to the injection of defects. Given a set of defects and their size ranges, the seven steps of the fault analysis should be first performed for the validation of static single-cell FPs in Table 3 (i.e., $n \le 1$ ). The simulation results are a list of {size range : EtD faults} pairs and a list of {size range : HtD faults} pairs, as shown in the figure. In case that no FP is sensitized in the presence of a defect with certain size range, the fault is considered as a weak fault belonging to HtD faults. Next, all defect size ranges resulting in HtD faults will be further analyzed using dynamic fault analysis with two sensitizing operations (i.e., n=2). In this way, some defect size ranges which lead to HtD faults from the previous static analysis may trigger EtD dynamic faults now; e.g., S=0w0 sensitizes a weak fault for a cell with a small defect, while S=0w0w0 may sensitize an EtD fault for this defective cell with the same defect size. Once two-operation single-cell dynamic fault analysis is done, we can redo similar fault analysis for n=3 for the remaining defect size ranges that result in HtD faults with two sensitizing operations. This simulation process can be iterated by extending S with one more operation each time until the pre-defined maximum number of operations ( $n_{\rm max}$ ) is reached. The aim of increasing the sensitizing operations is to reduce the defect size ranges which cause HtD faults meanwhile enlarging the ranges which lead to EtD faults. This is because EtD faults can simply be detected by March tests while HtD faults require DfT designs or stress tests to detect them. This fault analysis methodology is useful to optimize the ultimate test solution with a trade-off between the test quality and test overhead. #### VI. SIMULATION SETUP AND RESULTS In this section, we first introduce our simulation set-up including the simulation circuits and the defects we analyze. Thereafter, we present the fault analysis results. #### A. SIMULATION SETUP Figure 12 shows the defect-free simulation circuits consisting of a $2 \times 2$ 1T-1MTJ memory array, address decoders, write drivers, and precharge-based sense amplifiers. In our simulations, we used our Verilog-A MTJ compact model proposed in [54]. It has been calibrated with silicon measurement data of $\phi$ 60 mm MTJ devices. Compared to other MTJ models based on micromagnetic simulations [59], TCAD tools [60], and SPICE built-in circuit elements [61], our behavioral Verilog-A MTJ model is faster and more efficient in circuit simulations. The reason for this is that our model does not calculate differential equations such as the LLG equation at run-time for capturing the spin dynamics. More detailed comparisons between the different MTJ models can be found in [62]. The predictive technology model (PTM) [63] for 45 nm transistors was adopted to build peripheral circuits along with the NMOS selectors in memory cells. The address decoders decode the input address to select a specific memory cell. The write drivers [64] are responsible for generating appropriate switching current with certain direction (as illustrated in Figure 3) on the addressed cell. To ensure a high switching current, the supply voltage $V_{\rm dd}$ for write drivers is higher than the supply voltage $V_{\rm dd}$ for the rest of the circuits. The precharge-based sense amplifiers [64] perform read operations where a small read current flows through the *cell under read* and a *reference cell*. The resistance of the reference cell is set to $R_{\rm ref} = \frac{1}{2} \left( R_{\rm P} + R_{\rm AP} \right)$ so that the read current going through the reference cell is smaller than that going FIGURE 12. Simulation circuits consisting of 1T-1MTJ array and peripheral circuits. through the cell with $R_{\rm P}$ and larger than that going through the cell with $R_{\rm AP}$ . The comparison result in the read currents going through the cell under read and the reference cell determines the readout value of the sense amplifier. In terms of defect injection, we considered resistive opens, resistive bridges, as shown in Figure 6, and pinhole defects in a 1T-1MTJ cell, as shown in Figure 5(b). Each time one specific defect was injected into the simulation circuit and the faulty behavior of the memory cell was analyzed with the fault analysis methodology introduced in the previous section. For resistive bridges and opens, we swept the resistance from 1 $\Omega$ to 100 M $\Omega$ to represent the defect strength in our simulations. For the injection of pinhole defects, we replaced the defect-free MTJ model with the calibrated defective MTJ model proposed in [54]. The pinhole size is represented by an input parameter $A_{\rm ph}$ (the pinhole area normalized the cross-sectional area of the MTJ device) of the defective MTJ model. In our simulations, we swept $A_{\rm ph}$ from 0 to 100 percent. # **B. SIMULATION RESULTS** In this paper, we limit the fault analysis to single-cell static faults, since all defects (including the pinhole defects in MTJ devices) we take into account are within a memory cell and static faults are the most prominent faults. # 1) RESISTIVE DEFECTS IN INTERCONNECTS AND CONTACTS Table 4 lists the fault modeling results of all resistive opens (see Figure 6) in a single 1T-MTJ cell. For each defect in the table, the sensitized FPs depend on the defect strength (i.e., resistance value in this case). For a given resistance range, a group of FPs can be sensitized; each *fault group* requires a specific detection condition to detect at least one of the FPs in the group. This guarantees the detection of the corresponding defect range. For example, the fault analysis results of OC<sub>t</sub> (representing an open defect between the BL and the MTJ device) results in four different fault groups which depend on the defect resistance. (1) If the resistance of OC<sub>t</sub> is below 466 $\Omega$ , no FPs are sensitized; thus, it results in a weak fault. (2) If the resistance is between 466 $\Omega$ to 870 $\Omega$ , a single FP iR0NF0=(0r0/0/1) is sensitized; it belongs to a fault group named LR1 (indicating linear-resistor defect model). The detection condition for LR1 is simply a read operation on the cell which is in logic '0', irrespective of the addressing direction. We denote the detection condition as $\uparrow (...0, r0, ...)$ . (3) If the resistance is between 870 $\Omega$ and 1.6 k $\Omega$ , two FPs are sensitized including W0TF1= $\langle 1 \text{w0/1/} - \rangle$ and the previous iR0NF0. Since iR0NF0 also occurs in the second defect range, these two FPs are also grouped into LR1, leading to the same detection condition $\updownarrow$ (...0, r0,...). (4) If the resistance is above 1.6 k $\Omega$ , three FPs are sensitized as shown in the table. TABLE 4. Single-cell static fault modeling results of resistive opens. | Defect | Resistance $(\Omega)$ | Sensitized Fault<br>Primitive | Fault<br>Group | Detection<br>Condition | |--------------------------------------------------------|-----------------------------------------|------------------------------------------------|----------------|-------------------------------------------------------| | OC <sub>t</sub> & OC <sub>m</sub><br>& OC <sub>b</sub> | (466, 870]<br>(870, 1.6k]<br>(1.6k, +∞] | iR0NF0<br>iR0NF0, W0TF1<br>iR0NF0, W0TF1,W1TF0 | LR1 | $\mathop{\updownarrow} (\dots 0, r0, \dots)$ | | OS <sub>w</sub> | $(870, 2k]$ $(2k, +\infty]$ | W0TF1<br>W0TF1, W1TF0 | LR2 | $\updownarrow (\dots 1, \text{w0}, \text{r0}, \dots)$ | | OS <sub>r</sub> | (180, +∞] | iR0NF0 | LR1 | \$ (0, r0,) | | $OB_w$ | (870, 1.6k]<br>(1.6k, +∞] | W0TF1<br>W0TF1, W1TF0 | LR2 | $\updownarrow (\dots 1, w0, r0, \dots)$ | | $OB_r$ | (570, +∞] | iR0NF0 | LR1 | $\mathop{\updownarrow} (\dots 0, r0, \dots)$ | | OC <sub>w</sub> & OW <sub>i</sub> | (870, 14M]<br>(14M, +∞] | iRONFO<br>iRONFO, WOTF1,<br>W1TF0 | LR1 | $\updownarrow (\dots 0, r0, \dots)$ | TABLE 5. Single-cell static fault modeling results of resistive bridges. | | | Sensitized Fault | Fault | Detection | |---------------------|----------------------------|----------------------------------------------------|-------|----------------------------------------------| | Defect | Resistance $(\Omega)$ | Primitive Primitive | Group | Condition | | $BC_{SL-IN}$ | [0, 13k) | iR1NF1 | LR3 | $\mathop{\updownarrow} (\dots 1, r1, \dots)$ | | BC <sub>BL-IN</sub> | [0, 1.1k)<br>[1.1k, 3.1k) | <b>iR1NF1</b> , W1TF0, W0TF1 <b>iR1NF1</b> , W0TF1 | LR3 | <b>\$</b> (1, r1,) | | BC <sub>WL-SL</sub> | [0, 5.6k)<br>[5.6k, 56.1k) | iR0NF0, W0TF1<br>iR0NF0 | LR1 | $\mathop{\updownarrow} (\dots 0, r0, \dots)$ | | BC <sub>WL-IN</sub> | [0, 7.7k)<br>[7.7k, 13.1k) | iR0NF0, W0TF1 iR0NF0 | LR1 | $\mathop{\updownarrow} (\dots 0, r0, \dots)$ | Again, the occurrence of iR0NF0 makes $\updownarrow$ (...0, r0, ...) the simplest detection condition for this defect range. Note that the FPs given in bold font are the easiest ones to detect from a test point of view; detecting a single FP per fault group is enough to detect a defect with the corresponding size ranges. Similarly, Table 5 presents the fault modeling results for all resistive bridges in a single 1T-1MTJ cell. For instance, the resistive bridge $BC_{SL-IN}$ (which connects the SL to the internal cell node, as shown in Figure 6) results in iR1NF1= $\langle 1r1/1/0 \rangle$ when the resistance is below 13 k $\Omega$ ; it belongs to a new fault group LR3. The detection condition of LR3 is $\mathop{\updownarrow} (\dots 1, r1, \dots)$ . If the resistance is larger than 13 k $\Omega$ , it leads to a weak fault. #### 2) PINHOLE DEFECTS IN MTJ DEVICES Table 6 shows the fault modeling results of pinhole defects in MTJ devices; the fault group (denoted as DAx indicating device-aware defect model) and detection condition for each pinhole size range are also listed in the table. It can be seen that sufficiently large pinholes ( $A_{\rm ph} > 0.61\%$ ) make the MTJ device fall into the resistance range of '0' state or even of 'L' state, sensitizing easy-to-detect faults of DA5 and DA6; the corresponding fault primitives are listed in the table. Among those FPs, S1F0= $\langle 1/0/ - \rangle$ and S1FL= $\langle 1/L/ - \rangle$ (marked with bold font) are easy to detect with a read '1' (r1) operation. As the pinhole gets smaller ( $A_{\rm ph} \in (0.07\%, 0.61\%]$ ), it makes $R_{\rm P}$ fall into 'L' state and $R_{\rm AP}$ into 'U' state. Depending on the exact MTJ resistance in the AP state, the readout value can be one of the following three cases: (a) '0', (b) random ('?'), and (c) '1'. In Case (a) where $R_{\rm AP}$ is significantly smaller than the resistance of the reference cell (i.e., $A_{\rm ph} \in (0.35\%, 0.61\%]$ ), the readout value of the device in AP state is '0', resulting in faults of DA4. In this case, a r1 operation can detect the sensitized FP iR1DFU= $\langle 1r1/U/0 \rangle$ (marked with bold font). In Case (b) where $R_{\rm AP}$ is close to the resistance of the reference cell (i.e., $A_{\rm ph} \in (0.32\%, 0.35\%]$ ), the readout value is random, leading to strong hard-to-detect faults of DA3. In other words, the read operation is unstable, and therefore both '0' and '1' are possible readout values. Thus, a r1 operation cannot guarantee the detection. In Case (c) where $R_{\rm AP}$ is much larger than the resistance of the reference cell while it is still out of the spec. of the logic '1' (i.e., $A_{\rm ph} \in (0.07\%, 0.32\%]$ ), the readout TABLE 6. Single-cell static fault modeling results of pinhole defects. | Defect | $A_{ m ph}~(\%)$ | Sensitized<br>Fault Primitive | Fault<br>Group | Detection<br>Condition | | |--------|------------------|----------------------------------------------------------------|----------------|----------------------------------------------|--| | | (0.04, 0.07] | S1FU, W1DFU, W1TFU, dR1DFU | DA1 | | | | - | (0.07, 0.32] | S0FL, S1FU, W0DFL, W1DFU, W1TFU, W0TFL, dR0DFL, dR1DFU | DA2 | Stress tests/ DfT designs | | | DII. | (0.32, 0.35] | S0FL, S1FU, W0DFL, W1DFU, W1TFU, W0TFL, dR0DFL, rR1DFU | DA3 | | | | PH - | (0.35, 0.61] | S0FL, S1FU, W0DFL, W1DFU, W1TFU, W0TFL, dR0DFL, <b>iR1DFU</b> | DA4 | $\mathop{\updownarrow} (\dots 1, r1, \dots)$ | | | | (0.61, 0.78] | S0FL, <b>S1F0</b> , W0DFL, W1DF0, W1TF0, W0TFL, dR0DFL, iR1DF0 | DA5 | $\mathop{\updownarrow} (\dots 1, r1, \dots)$ | | | | (0.78, 100] | S0FL, S1FL, W0DFL, W1DFL, W1TFL, W0TFL, dR0DFL, iR1DFL | DA6 | \$ (1, r1,) | | is '1'. In this case, strong hard-to-detect faults of DA2 are sensitized which cannot be detected by March tests. As the pinhole area becomes smaller between 0.04 to 0.07 percent, $R_{\rm AP}$ falls into a 'U' state, while $R_{\rm P}$ remains in the correct range. Similarly, the sensitized strong hard-to-detect faults of DA1 cannot be detected by March tests. If the pinhole size is smaller than 0.04 percent, it leads to a weak fault, while the cell still behaves logically correct. Conventionally, MTJ-related defects irrespective of their physical natures are modeled as linear resistors either in series with (i.e., $OC_t$ in Table 4) or in parallel (i.e., $S1_{BL-IN}$ in Table 5) to an *idea defect-free* MTJ device, as can be found in [8], [9], [10], [11], [12], [13]. Comparing the fault modeling results of our proposed pinhole defect model (PH) with the series resistor model $OC_t$ and the parallel resistor model $S1_{BL-IN}$ reveals the following. - The faulty behavior of the memory due to a pinhole defect *cannot* be covered by the conventional resistor-based defect models. Figure 13 shows there are five fault groups in Table 6 which are not observed with resistor models OC<sub>t</sub> and S1<sub>BL-IN</sub>, while only a single FP (W1TF0=⟨0w1/0/-⟩) is in overlap; it occurs in both fault groups DA5 and LR2. With the resistor-based defect models, only '0' and '1' states were observed in the simulations. This is because the MTJ device is considered as a *black box* and *ideal*. However, our simulations and measurement data clearly show that pinhole defects can lead the device to 'U' or even 'L' state. - Conventional resistor-based defect models may result in wrong fault models. Figure 13 shows that OC<sub>t</sub> and S1<sub>BL-IN</sub> result in two fault group LR1 and LR3 which are not applicable to pinhole defects (i.e., not observed with our device-aware pinhole defect model). FIGURE 13. Our device-aware (DA) model vs. conventional linearresistor (LR) model for pinhole defects in MTJ devices. FIGURE 14. The $R_{\rm AP}$ of devices with pinhole defects degrades under pulse stress with elevated voltage and prolonged pulse width. The above observations clearly indicate that test algorithms developed with the conventional resistor-based defect modeling approach not only cannot guarantee the detection of pinhole defects leading to test escapes, but also may waste test time and resources as they target non-existing faults. Hence, more attention needs to be paid to the analysis and modeling of defects in MTJ devices, since those defects cannot be simply modeled as linear resistors but they have significant impacts on the data-storing MTJ devices in STT-MRAMs. #### VII. TEST DEVELOPMENT Based on the previous fault analysis results, appropriate test solutions can be developed. All easy-to-detect faults can be detected by March tests. To minimize the test cost, the minimal detection condition for each fault group is first identified. Thereafter, all the detection conditions for all fault groups are merged to obtain an optimal test algorithm. For example, Tables 4 and 5 list all sensitized fault primitives, their fault groups, and detection conditions for considered resistive defects in interconnects. By combining all the detection conditions in the two tables, March algorithms can be derived. For instance, the March element $\$ (w1, r1, w0, r0) or March C-[65], [66] can be used to detect all these easy-to-detect faults. For pinhole defects in MTJ devices, it is clear that the larger the pinhole, the larger its fault effect; hence, the easier it is to be detected, based on our simulation results with the calibrated pinhole defect model. Combining the last three rows in Table 6, it is clear that any March algorithm including the element $\mathop{\updownarrow}(w1,r1)$ can guarantee the detection of a pinhole defect with $A_{\rm ph}>0.35\%$ as it sensitizes only easy-to-detect faults. However, for smaller pinhole defects ( $A_{\rm ph} \leq 0.35\%$ ), HtD faults are sensitized. They are typically related to the cell being in a forbidden state (i.e., H, L, or U) or to random readout values. Obviously, March tests cannot guarantee the detection of such faults, although they may detect some of them. For example, iR1DFU= $\langle 1r1/U/0 \rangle$ of DA3 may be detected by a March test { $\psi(w1)$ , $\psi(r1)$ }. Applying March tests multiple times with different data background and address sequences [52], [66] will increase the detection probability of such faults. As small pinhole defects grow in area over time due to the accumulated Joule heating, they would cause an early breakdown in the field if not detected during manufacturing tests [54]. Hence, guaranteeing their detection is a must. Using DfT or stress tests are common practices to further increase the change of detecting HtD faults. One possible solution is to subject the STT-MRAM to a hammering write '1' operation sequence with elevated voltage or prolonged pulse width to deliberately speedup the growth of pinhole defects, so as to transform hard-to-detect faults to easy-todetect faults. Figure 14 shows the measurement data of four selected MTJ devices under a stress test. In this test, we constantly applied hammering write '1' operations (P o AP switching) to hundreds of $\phi$ 60 mm MTJ devices for 400k cycles; the pulse amplitude and width are -0.8 V and 50 ns, respectively. As can be seen in the figure, device A (green wide line on the top) which represents the majority of devices under test survived this stress test. In contrast, three devices broke down within the first 40 cycles (denoted as B, C, D). The resistance $(R_{AP})$ of device C (blue) in AP state was already below the nominal $R_{\rm P}$ value ( $\sim 2 \text{ k}\Omega$ ) of good devices before this stress test. Thus, this pinhole defect can be easily detected by March tests. However, detecting pinhole defects in devices B and D cannot be guaranteed by March tests at t = 0, since these two devices have small pinholes and their initial $R_{\rm AP}$ values are close to the nominal $R_{\rm AP}$ of defect-free devices (e.g., device A). Under pulse stress, the pinhole defects quickly grow up into larger ones leading to a reduction in the resistance of the MTJ devices. Hence, stress test is an effective way to detect devices with small pinhole defects. It is worth noting that this approach is prohibitively expensive for high-volume testing. In addition, the amplitude and duration of the hammering write pulse need to be carefully tuned to avoid any inadvertent destruction of good devices while maintaining an acceptable test effectiveness and efficiency. # VIII. DISCUSSION Conventionally, all manufacturing defects are modeled as linear resistors for STT-MRAM testing. Although this resistor-based defect modeling approach is valid to cover defects in interconnects and contacts, it is not qualified to model defects in MTJ devices, which are the data-storing elements in STT-MRAMs. To develop an effective yet efficient test solution for STT-MRAM, it is of great importance to understand and accurately model STT-MRAM-specific defects. Thereafter, a systematic fault analysis is needed to extract realistic fault models which reflect the physical defects. The proposed fault modeling framework has the following advantages. Accurate and realistic fault modeling: With our proposed three-step defect modeling approach, defects such as pinhole defects in MTJ devices are accurately modeled and presented at electrical level. The defective MTJ model then can be used to perform fault analysis in a comprehensive and systematic manner based on our proposed fault modeling framework. In this way, accurate and realistic fault models which reflect the physical defects can be extracted from the predefined fault space. - Optimal, efficient, and high-quality test solutions: Since fault models are the targets of manufacturing tests, accurate and realistic fault models results in more efficient and optimal test solutions. For example, in this paper we analyzed the fault behavior of memory cells due to pinhole defects and derived corresponding fault primitives, the majority of which were not observed with resistive defect models. This means that tests developed based on linear-resistor injection cannot catch MTJ devices with small pinhole defects, leading to test escapes. However, our proposed defect and fault modeling methodology sheds more light on the test development to detect physical defects. - Fast diagnosis and yield learning: With our proposed approach, each manufacturing defect can be modeled and analyzed separately, instead of using linear resistors to represent all possible defects, so that unique fault signatures can be created for each defect. The clear mapping relations between physical defects and fault models are useful for fast defect diagnosis and yield learning. Challenges of our proposed defect and fault modeling methodology remain, despite the above-mentioned superiority over the conventional approach. - Interdisciplinary collaboration: Understanding and modeling the physical STT-MRAM-specific defects require significantly more efforts than simply modeling them as linear resistors. It is necessary to have interdisciplinary collaboration between the device, processing technology, and test communities. Researchers at technology level are good at understanding and modeling the effects of defects on physical and technology parameters of the device and thereafter the electrical parameters, whereas test researchers are skilled with fault analysis and test development. Clearly, the fault modeling paradigm is changing for emerging technologies such as STT-MRAM. - Defect measurements data: To obtain a good defect model, measurement data of real defective devices is crucial to calibrate the model. In addition, collecting and analyzing silicon data are also helpful to understand the defect mechanism, occurrence rate, location, etc. # IX. CONCLUSION This paper demonstrates a paradigm shift in defect and fault modeling for STT-MRAMs. It has been shown based on device measurements and circuit simulations using calibrated MTJ models that the conventional linear-resistor-based defect modeling approach is not qualified to model the defects in MTJ devices which are the data-storing elements in STT-MRAMs. These MTJ-related defects need to be modeled by adjusting the affected technology parameters and subsequent electrical parameters to fully capture the defect impact on both the device's electrical and magnetic properties. Apart from realistic and accurate defect injection, accurate fault modeling is also crucial for high-quality test development. To this end. we proposed a systematic fault analysis methodology, which was applied to derive accurate fault models corresponding to resistive defects in interconnects and pinhole defects in MTJ devices. The derived easy-to-detect faults can be detected by March tests meeting all detection conditions, whereas all hardto-detect faults require DfT designs or stress tests to guarantee the detection. Other manufacturing defects, especially those in MTJ devices, should also be analyzed and modeled in the same manner as we did in order to ensure accurate fault modeling and development of high-quality manufacturing tests for STT-MRAMs. # **REFERENCES** - Y. Chen, H. H. Li, I. Bayram, and E. Eken, "Recent technology advances of emerging memories," *IEEE Des. Test*, vol. 34, no. 3, pp. 8–22, Jun. 2017, doi: 10.1109/MDAT.2017.2685381. - [2] S. Yu and P.-Y. Chen, "Emerging memory technologies: Recent trends and prospects," *IEEE Solid-State Circuits Mag.*, vol. 8, no. 2, pp. 43–56, Jun. 2016, doi: 10.1109/MSSC.2016.2546199. - [3] X. Fong, Y. Kim, R. Venkatesan, S. H. Choday, A. Raghunathan, and K. Roy, "Spin-transfer torque memories: Devices, circuits, and systems," *Proc. IEEE*, vol. 104, no. 7, pp. 1449–1488, Jul. 2016, doi: 10.1109/ JPROC.2016.2521712. - [4] T. Coughlin, "The growing market for MRAMs," Aug. 2018. [Online]. Available: https://www.forbes.com/sites/tomcoughlin/2018/08/10/mram-developer-day/#6d59021a6c6d - [5] J. M. Slaughter et al., "High density ST-MRAM technology," in Proc. Int. Electron Devices Meeting, 2012, pp. 29.3.1–29.3.4, doi: 10.1109/ IEDM.2012.6479128. - [6] O. Golonzka et al., "MRAM as embedded non-volatile memory solution for 22FFL FinFET technology," in Proc. IEEE Int. Electron Devices Meeting, 2018, pp. 18.1.1–18.1.4, doi: 10.1109/IEDM.2018.8614620. - [7] Y. J. Song et al., "Demonstration of highly manufacturable STT-MRAM embedded in 28nm logic," in *Proc. IEEE Int. Electron Devices Meeting*, 2018, pp. 18.2.1–18.2.4, doi: 10.1109/IEDM.2018.8614635. - [8] C. L. Su et al., "MRAM defect analysis and fault modeling," in Proc. IEEE Int. Test Conf., 2004, pp. 124–133, doi: 10.1109/TEST.2004.1386944. - [9] J. Azevedo et al., "A complete resistive-open defect analysis for thermally assisted switching MRAMs," *IEEE Trans. Very Large Scale Integr.* (VLSI) Syst., vol. 22, no. 11, pp. 2326–2335, Nov. 2014, doi: 10.1109/ TVLSI.2013.2294080. - [10] C.-L. Su, C.-W. Tsai, C.-W. Wu, C.-C. Hung, Y.-S. Chen, and M.-J. Kao, "Testing MRAM for write disturbance fault," in *Proc. IEEE Int. Test Conf.*, 2006, pp. 1–9, doi: 10.1109/TEST.2006.297702. - [11] A. Chintaluri, H. Naeimi, S. Natarajan, and A. Raychowdhury, "Analysis of defects and variations in embedded spin transfer torque (STT) MRAM arrays," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 6, no. 3, pp. 319–329, Sep. 2016, doi: 10.1109/JETCAS.2016.2547779. - [12] I. Yoon, A. Chintaluri, and A. Raychowdhury, "EMACS: Efficient MBIST architecture for test and characterization of STT-MRAM arrays," in *Proc. IEEE Int. Test Conf.*, 2016, pp. 1–10, doi: 10.1109/ TEST.2016.7805834. - [13] S. M. Nair et al., "Defect injection, fault modeling and test algorithm generation methodology for STT-MRAM," in Proc. IEEE Int. Test Conf., 2018, pp. 1–10, doi: 10.1109/TEST.2018.8624725. - [14] E. I. Vatajelu, P. Prinetto, M. Taouil, and S. Hamdioui, "Challenges and solutions in emerging memory testing," *IEEE Trans. Emerg. Topics Comput.*, vol. 7, no. 3, pp. 493–506, Jul.–Sep. 2019, doi: 10.1109/ TETC.2017.2691263. - [15] R. Bishnoi, M. Ebrahimi, F. Oboril, and M. B. Tahoori, "Read disturb fault detection in STT-MRAM," in *Proc. Int. Test Conf.*, 2014, pp. 1–7, doi: 10.1109/TEST.2014.7035342. - [16] A. V. Khvalkovskiy et al., "Basic principles of STT-MRAM cell operation in memory arrays," J. Phys. D: Appl. Phys., vol. 46, 2013, Art. no. 074001, doi: 10.1088/0022-3727/46/7/074001. - [17] G. S. Kar et al., "Co/Ni based p-MTJ stack for sub-20nm high density stand alone and high performance embedded memory application," in Proc. IEEE Int. Electron Devices Meeting, 2014, pp. 19.1.1–19.1.4, doi: 10.1109/IEDM.2014.7047080. - [18] D. Apalkov, B. Dieny, and J. M. Slaughter, "Magnetoresistive random access memory," in *Proc. IEEE*, vol. 104, no. 10, pp. 1796–1830, Oct. 2016, doi: 10.1109/JPROC.2016.2590142. - [19] L. Wu, M. Taouil, S. Rao, E. J. Marinissen, and S. Hamdioui, "Electrical modeling of STT-MRAM defects," in *Proc. IEEE Int. Test Conf.*, 2018, pp. 1–10, doi: 10.1109/TEST.2018.8624749. - [20] H. Jin et al., "Tunnel magnetoresistance effect," in The Physics of Ferromagnetism, vol. 158. Berlin, Germany: Springer Series in Materials Science, 2012, pp. 403–432, doi: 10.1007/978-3-642-25583-0 12. - [21] Y. Wang, "Reliability analysis of spintronic device based logic and memory circuits," Ph.D. dissertation, Dept. Commun. and Electron., Telecom ParisTech, Palaiseau, France, 2017. - [22] C. J. Lin et al., "45nm low power CMOS logic compatible embedded STT MRAM utilizing a reverse-connection 1T/1MTJ cell," in Proc. IEEE Int. Electron Devices Meeting, 2009, pp. 1–4, doi: 10.1109/ IEDM.2009.5424368. - [23] Y. M. Lee, C. Yoshida, K. Tsunoda, S. Umehara, M. Aoki, and T. Sugii, "Highly scalable STT-MRAM with MTJs of top-pinned structure in 1T/ 1MTJ Cell," in *Proc. Symp. VLSI Technol.*, 2010, pp. 49–50, doi: 10.1109/VLSIT.2010.5556123. - [24] D. Lee et al., "High-performance low-energy STT MRAM based on balanced write scheme," in Proc. ACM/IEEE Int. Symp. Low Power Electronics Des., 2012, pp. 9–14, doi: 10.1145/2333660.2333665. - [25] A. K. Jones, X. Wang, Y. Li, A. K. Jones, and Y. Chen, "Asymmetry of MTJ switching and its implication to STT-RAM designs," in *Proc. Des. Autom. Test Europe Conf. Exhib.*, 2012, pp. 1313–1318, doi: 10.1109/DATE.2012.6176695. - [26] W. Zhao et al., "Design considerations and strategies for high-reliable STT-MRAM," Microelectron. Reliab., vol. 51, pp. 1454–1458, 2011, doi: 10.1016/j.microrel.2011.07.001. - [27] Y. J. Song et al., "Highly functional and reliable 8Mb STT-MRAM embedded in 28nm logic," in Proc. IEEE Int. Electron Devices Meeting, 2016, pp. 27.2.1–27.2.4, doi: 10.1109/IEDM.2016.7838491. - [28] M. Bushnell et al., Essentials of Electronic Testing for Digital, Memory and Mixed-Signal VLSI Circuits. Berlin, Germany: Springer Science & Business Media, 2004. - [29] L. Tillie et al., "Data retention extraction methodology for perpendicular STT-MRAM," in Proc. IEEE Int. Electron Devices Meeting, 2016, pp. 27.3.1–27.3.4, doi: 10.1109/IEDM.2016.7838492. - [30] D. Shum et al., "CMOS-embedded STT-MRAM arrays in 2x nm nodes for GP-MCU applications," in Proc. Symp. VLSI Technol., 2017, pp. T208– T209, doi: 10.23919/VLSIT.2017.7998174. - [31] Y. G. Fedorenko, "Ion-beam-induced defects in CMOS technology: Methods of study," in *Ion Implantation: Research and Application*, I. Ahmad, Ed. IntechOpen, Jun. 2017, ch. 4, pp. 67–98, doi: 10.5772/67760. - [32] M. Sachdev et al., Defect-Oriented Testing for Nano-Metric CMOS VLSI Circuits. Berlin, Germany: Springer Science & Business Media, 2007 - [33] J. C.-M. Li and E. J. McCluskey, "Diagnosis of resistive-open and stuck-open defects in digital CMOS ICs," *IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.*, vol. 24, no. 11, pp. 1748–1759, Nov. 2005, doi: 10.1109/TCAD.2005.852457. - [34] N. Z. Haron and S. Hamdioui, "On defect oriented testing for hybrid CMOS/ Memristor memory," in *Proc. Asian Test Symp.*, 2011, pp. 353–358, doi: 10.1109/ATS.2011.66. - [35] M. Komalan et al., "Cross-layer design and analysis of a low power, high density STT-MRAM for embedded systems," in Proc. IEEE Int. Symp. Circuits Syst., May 2017, pp. 1–4, doi: 10.1109/ISCAS.2017.8050923. - [36] W. Zhao et al., "Failure analysis in magnetic tunnel junction nanopillar with interfacial perpendicular magnetic anisotropy," *Materials*, vol. 9, pp. 1–17, 2016, doi: 10.3390/ma9010041. - [37] B. Oliver et al., "Two breakdown mechanisms in ultrathin alumina barrier magnetic tunnel junctions," J. Appl. Phys., vol. 95, pp. 1315–1322, 2004, doi: 10.1063/1.1636255. - [38] Y. Wang et al., "Compact model of dielectric breakdown in spin-transfer torque magnetic tunnel junction," *IEEE Trans. Electron Devices*, vol. 63, no. 4, pp. 1762–1767, Apr. 2016, doi: 10.1109/TED.2016.2533438. - [39] H. Meng et al., "Annealing effects on CoFeB-MgO magnetic tunnel junctions with perpendicular anisotropy," J. Appl. Phys., vol. 110, 2011, Art. no. 033904, doi: 10.1063/1.3611426. - [40] H. Maehara *et al.*, "Tunnel magnetoresistance above 170% and resistance–area product of 1 $\Omega$ ( $\mu$ m)<sup>2</sup> attained by in situ annealing of ultra-thin MgO tunnel barrier," *Appl. Phys. Express*, vol. 4, Mar. 2011, Art. no. 033002, doi: 10.1143/apex.4.033002. - [41] S. Van Beek et al., "Impact of processing and stack optimization on the reliability of perpendicular STT-MRAM," in Proc. IEEE Int. Rel. Phys. Symp., 2017, pp. 5A-1.1–5A-1.5, doi: 10.1109/IRPS.2017.7936318. - [42] W. Boullart et al., "STT MRAM patterning challenges," Adv. Etch Tech. Nanopatterning II, Y. Zhang, G. S. Oehrlein, and Q. Lin, Eds., International Society for Optics and Photonics, pp. 94–102, Mar. 2013, doi: 10.1117/12.2013602. - [43] K. Sugiura et al., "Ion beam etching technology for high-density spin transfer torque magnetic random access memory," *Japanese J. Appl. Phys.*, vol. 48, 2009, Art. no. 08HD02, doi: 10.1143/JJAP.48.08HD02. - [44] K. Nagahara et al., "Ion-beam-etched profile control of MTJ cells for improving the switching characteristics of high-density MRAM," IEEE Trans. Magn., vol. 42, no. 10, pp. 2745–2747, Oct. 2006, doi: 10.1109/ TMAG.2006.878862. - [45] E. H. Kim et al., "Evolution of etch profile of magnetic tunnel junction stacks etched in a CH3OH/Ar plasma," J. Electroch. Soc., vol. 159, pp. H230–H234, 2012, doi: 10.1149/2.012203jes. - [46] A. A. Garay et al., "Inductively coupled plasma reactive ion etching of magnetic tunnel junction stacks in a CH3COOH/Ar gas," ECS Solid State Lett., vol. 4, pp. P77–P79, 2015, doi: 10.1149/2.0071510ssl. - [47] L. Wu et al., "Impact of magnetic coupling and density on STT-MRAM performance," in Proc. Des. Autom. Test Europe Conf. Exhib., 2020. - [48] Y. Wang et al., "Impact of stray field on the switching properties of perpendicular MTJ for scaled MRAM," in Proc. Int. Electron Devices Meeting, Dec. 2012, pp. 29.2.1–29.2.4, doi: 10.1109/IEDM.2012.6479127. - [49] H. Jiancheng et al., "Effect of the stray field profile on the switching characteristics of the free layer in a perpendicular magnetic tunnel junction," J. Appl. Phys., vol. 117, 2015, Art. no. 17B721, doi: 10.1063/1.4916037. - [50] J. H. Jeong and T. Endoh, "Novel oxygen showering process (OSP) for extreme damage suppression of sub-20nm high density p-MTJ array without IBE treatment," in *Proc. Symp. VLSI Technol.*, 2015, pp. T158–T159, doi: 10.1109/VLSIT.2015.7223660. - [51] K. Lee et al., "22-nm FD-SOI embedded MRAM with full solder reflow compatibility and enhanced magnetic immunity," in Proc. IEEE Symp. VLSI Technol., 2018, pp. 183–184, doi: 10.1109/VLSIT.2018.8510655. - [52] A. J. Van de Goor, Testing Semiconductor Memories: Theory and Practice, vol. 225. Hoboken, NJ, USA: Wiley, 1991. - [53] S. Hamdioui, "Testing multi-port memories: theory and practice," Ph.D. dissertation, Dept. Quantum Comput. Eng., Delft Univ. Technol., Delft, the Netherlands, 2001. - [54] L. Wu et al., "Pinhole defect characterization and fault modeling for STT-MRAM testing," in Proc. IEEE Eur. Test Symp., 2019, pp. 1–6, doi: 10.1109/ETS.2019.8791518. - [55] S. Mukherjee et al., "Role of boron diffusion in CoFeB/MgO magnetic tunnel junctions," Phys. Rev. B, vol. 91, Feb. 2015, Art. no. 085311, doi: 10.1103/PhysRevB.91.085311. - [56] J. H. Lim et al., "Investigating the statistical-physical nature of MgO dielectric breakdown in STT-MRAM at different operating conditions," in Proc. IEEE Int. Electron Devices Meeting, 2018, pp. 25.3.1–25.3.4, doi: 10.1109/IEDM.2018.8614515. - [57] S. Hamdioui and A. J. Van De Goor, "An experimental analysis of spot defects in SRAMs: Realistic fault models and tests," in *Proc. 9th Asian Test Symp.*, 2000, pp. 131–138, doi: 10.1109/ATS.2000.893615. - [58] S. Hamdioui et al., "Memory fault modeling trends: A case study," J. Electron. Testing, vol. 20, pp. 245–255, Jun. 2004, doi: 10.1023/B: JETT.0000029458.57095.bb. - [59] M. Frankowski et al., "Micromagnetic model for studies on magnetic tunnel junction switching dynamics, including local current density," *Phys. B: Con*dens. Matter, vol. 435, pp. 105–108, 2014, doi: 10.1016/j.physb.2013.08.051. - [60] F. O. Heinz and L. Smith, "Fast simulation of spin transfer torque devices in a general purpose TCAD device simulator," in *Proc. Int. Conf. Simul. Semicond. Processes Devices*, Sep. 2013, pp. 127–130, doi: 10.1109/ SISPAD.2013.6650591. - [61] G. D. Panagopoulos, C. Augustine, and K. Roy, "Physics-based SPICE-compatible compact model for simulating hybrid MTJ/CMOS circuits," *IEEE Trans. Electron Devices*, vol. 60, no. 9, pp. 2808–2814, Sep. 2013, doi: 10.1109/TED.2013.2275082. - [62] H. Lim et al., "A survey on the modeling of magnetic tunnel junctions for circuit simulation," Active Passive Electron. Components, vol. 2016, pp. 1–32, 2016, doi: 10.1155/2016/3858621. - [63] Nanoscale Integration and Modeling (NIMO) Group at ASU, "Predictive technology model," 2008, Retrieved in 2018. [Online]. Available: http:// ptm.asu.edu/ - [64] W. Kang et al., "A low-cost built-in error correction circuit design for STT-MRAM reliability improvement," *Microelectron. Reliab.*, vol. 53, pp. 1224–1229, 2013, doi: 10.1016/j.microrel.2013.07.036. - [65] A. J. Van De Goor, "Using march tests to test SRAMs," *IEEE Design Test Comput.*, vol. 10, no. 1, pp. 8–14, Mar. 1993, doi: 10.1109/54.199799. - [66] S. Hamdioui, A. J. van de Goor, J. D. Reyes, and M. Rodgers, "Memory test experiment: Industrial results and data," *IEE Proc. - Comput. Digital Techn.*, vol. 153, no. 1, pp. 1–8, Jan. 2006, doi: 10.1049/ip-cdt:20050104. **LIZHOU WU** received the BSc degree in electronic science and engineering from Nanjing University, China, in 2013, and the MSc degree in computer science and engineering from the National University of Defense Technology, China, in 2016. Currently, he is working toward the PhD degree in the Computer Engineering Laboratory, Delft University of Technology, the Netherlands. His research focuses on MTJ characterization and modeling, STT-MRAM test and reliability. He is a student member of the IEEE SIDDHARTH RAO received the PhD degree in electrical engineering from the National University of Singapore (NUS), Singapore, for his work in the field of microwave-assisted magnetic recording. He is currently a senior researcher with the field of memory technologies at IMEC, in Leuven, Belgium. His current research interests include non-volatile memory technology and design, exploratory spintronics, and magnetism design in real-world applications. He has authored and coauthored more than 30 papers in several prestigious and peer-reviewed journals and conferences. He is a reviewer for the American Physical Society (APS) and the *IEEE Electron Device Letters*. MOTTAQIALLAH TAOUIL received the MSc and PhD degrees (both with honors) in computer engineering from the Delft University of Technology, the Netherlands. He is currently an assistant professor with the Computer Engineering Laboratory, the Delft University of Technology. His current research interests include hardware security, embedded systems, 3D stacked integrated circuits, VLSI design and test, built-in-self-test, design for testability, yield analysis, and memory test structures. He is a member of the IEEE. GUILHERME CARDOSO MEDEIROS received the BSc degree from the Pontifical Catholic University of Rio Grande do Sul (PUCRS), Porto Alegre, Brazil, in 2015, and the MSc degree from the Pontifical Catholic University of Rio Grande do Sul (PUCRS), Porto Alegre, Brazil, in 2017. He is currently working toward the PhD degree at the Delft University of Technology, the Netherlands. His main areas of interests are test strategies for SRAMs, defect and fault modelling for FinFET devices, and emerging memory technologies. He is a student member of the IEEE. MORITZ FIEBACK received the BSc and MSc degree from the Delft University of Technology, the Netherlands, in 2015 and 2017, respectively. Currently, he is currently working toward the PhD degree at the Delft University of Technology, the Netherlands. His research interests include device modeling, test, and reliability of emerging memories. He is a student member of the IEEE. **ERIK JAN MARINISSEN** received the MSc degree in computing science, in 1990 and the PDEng degree in software technology, in 1992, both from the Eindhoven University of Technology. He is a scientific director at IMEC in Leuven, Belgium, the world-leading independent R&D center in nanoelectronics technology. His research on IC test and design-for-test covers topics as diverse as 3D-stacked ICs, 3 nm CMOS, silicon photonics, and STT-MRAMs. Marinissen is also a visiting researcher at the Eindhoven University of Technology in the Netherlands, as well as lecturer at the High Tech Institute in Eindhoven. In his thirty-year career in industrial research, Marinissen worked previously at NXP Semiconductors and Philips Research in Eindhoven, Nijmegen (the Netherlands), and Sunnyvale (California). He is a member of IEEEs Test Technology Standardization Committee and served as editor-in-chief of IEEE Std 1500 and as founder/chair (currently vice-chair) of the IEEE Std P1838 Working Group on 3D-SIC test access. He served as general/program chair of several conferences (including DDECS'02, ETW'03, ETS'06, DATE'13) and founded and chaired three workshops (DSNOC, 3D Integration, 3D-TEST). He serves on numerous conference committees (including ATS, DFTS, ETS, ITC, LATS), and on the editorial boards of the IEEE Design & Test and the Springer's Journal of Electronic Testing: Theory and Applications. He has authored a book, contributed chapters to six other books, and he is (co-)author of more than 270 journal and conference papers (h-index: 42). He is also (co-)inventor of 18 granted international patent families. He is recipient of the most significant paper awards at ITC 2008 and 2010, best paper awards at the Chrysler-Delco-Ford Automotive Electronics Reliability Workshop 1995 and the IEEE International Board Test Workshop 2002, the Most Inspirational Presentation Award at the IEEE Semiconductor Wafer Test Workshop 2013, the HiPEAC Technology Transfer Award 2015, the SEMI Best ATE Paper Award 2016, the National Instruments' Engineering Impact Award 2017, the IEEE Standards Association's Emerging Technology Award 2017, and best paper awards at IWLPC 2018 and LATS 2019. He is a fellow of the IEEE. GOURI SANKAR KAR received the PhD degree in semiconductor device physics from the Indian Institute of Technology, Khragput, India, in 2002. From 2002 to 2005, he was a visiting scientist at Max Planck Institute for Solid State Research, Stuttgart, Germany, where he worked with Nobel Laureate (1985, Quantum Hall Effect) and Prof. Klaus von Klitzing on quantum dot FET. In 2006, he joined Infineon/Qimonda in Dresden, Germany as a lead integration engineer. There he worked on the vertical transistor for DRAM application. In 2009, he joined IMEC in Leuven, Belgium, where he is currently a distinguished member of technical staff (DMTS). In this role, he defines the strategy and vision for RRAM, DRAM-MIMCAP, and STT-MRAM programs both for stand-alone and embedded applications. SAID HAMDIOUI received the MSEE and PhD degrees (both with honors) from the Delft University of Technology, Delft, the Netherlands. He is currently chair professor on Dependable and Emerging Computer Technologies, head of the Quantum and Computer Engineering Department, and also serving as head of the Computer Engineering Laboratory (CE-Lab) of the Delft University of Technology, the Netherlands. He is also co-founder and CEO of Cognitive-IC, a start-up focusing on hardware dependability solutions. Prior to joining TUDelft as a professor, he spent about seven years within industry including Intel Corporation (Califorina), Philips Semiconductors R&D (Crolles, France) and Philips/ NXP Semiconductors (Nijmegen, the Netherlands). His research focuses on two domains, dependable CMOS nano-computing including testability, reliability, hardware security, emerging technologies, and computing paradigms (including memristors for logic and storage, in-memory-computing for big-data applications). He owns two patents, has published one book and contributed to other two, and coauthored more than 200 conference and journal papers. He has consulted for many companies (such as Intel, ST, Altera, Atmel, and Renesas) in the area of memory testing and has collaborated with many industry/research partners (such as Intel, IMEC, NXP, Intrinsic ID, DS2, ST Microelectronics, Cadence, and Politic di Torino) in the field of dependable nano-computing and emerging technologies. He delivered dozens of keynote speeches, distinguished lectures, and invited presentations and tutorial at major international forums/ conferences/schools and at leading semiconductor companies. He is a senior member of the IEEE, associate editor of the IEEE Transactions on VLSI Systems (TVLSI), and he serves on the editorial board of the IEEE Design & Test, the Elsevier Microelectronic Reliability Journal, and of the Journal of Electronic Testing: Theory and Applications (JETTA). He is also a member of AENEAS/ENIAC Scientific Committee Council (AENEAS =Association for European NanoElectronics Activities).