Design of a Superconducting ALU With a 3-Input XOR Gate
Kazuhiro Takahashi, Shuichi Nagasawa, Haruhiro Hasegawa, Kazunori Miyahara, Hiroshi Takai, and Youichi Enomoto

Abstract—In order to develop superconducting Digital Signal Processors (DSP’s), we have been studying a superconducting 1-bit Arithmetic Logic Unit (ALU). This ALU has the simplest function of AND, OR, ADD (addition), and SUB (subtraction). The ALU operates in a 3-stage pipeline. All logic functions such as AND, OR, and SUM (summation) can be executed within a single stage of the pipeline. In order to achieve the high-speed operation of the ALU, we proposed and designed a novel 3-input XOR gate, which can operate in only one logic stage. Our simulation study showed that all components of the ALU can operate up to 50 GHz. These ALU components were fabricated and tested at low speed. Large bias margins of more than 37% were achieved. The designed ALU’s were laid out and fabricated with Nb process. The ALU occupied the area of 1200 m 2600 m, which contains 560 Josephson junctions (JJ’s). Index Terms—Arithmetic logic unit, full adder, Josephson junction, single flux quantum device.

I. INTRODUCTION LTRA high-speed processors will be needed in various fields, such as a new generation computer and/or data communication system in the near future. Semiconductor processors have been developed with Very Large Scale Integrated circuit (VLSI) technology where a huge number of transistors can be fabricated on a single chip. They, however, have a serious problem of increasing power consumption on improving their operation speed. This problem will be solved by developing superconducting processors having the feature of high-speed operation with low-power-consumption. Superconducting Single Flux Quantum (SFQ) circuits have been developed by many researchers in superconducting electronics [1]. In SFQ circuits, SFQ is used as an information bit. Most SFQ logic gates act as voltage-latching gates holding their logic states. Therefore the SFQ circuits are suitable for deep pipeline architecture.
Manuscript received August 6, 2002. This work was supported by the New Energy and Industrial Technology Development Organization (NEDO) as Collaborative Research and Development of Fundamental Technologies for Superconductivity Applications. K. Takahashi and K. Miyahara are with the Superconductivity Research Laboratory, International Superconductivity Technology Center, Tokyo 135-0062, Japan, and also with Tokyo Denki University, Tokyo 101-8457, Japan (e-mail:; S. Nagasawa, H. Hasegawa, and Y. Enomoto are with the Superconductivity Research Laboratory, International Superconductivity Technology Center, Tokyo 135-0062, Japan (e-mail:;; H. Takai is with Tokyo Denki University, Tokyo 101-8457, Japan (e-mail: Digital Object Identifier 10.1109/TASC.2003.813944

Fig. 1.


The purpose of this research is to realize a superconducting Arithmetic Logic Unit (ALU) which can operate at high speed for the application in Digital Signal Processors (DSP). We have investigated a full-adder (FA) which is the most important component circuit of an ALU [1], [2]. An FA with serial inputs has been widely used [3]–[8]. The operation time of these types of circuits becomes long because of their large setup time. In order to overcome this problem, we newly proposed and developed, so-called, a 3-input exclusive OR (XOR) gate. Our FA is composed of the 3-input XOR gate and a majority gate [9]. In this paper, we describe the design and experimental operation results of the component circuits of the ALU, such as 3-input XOR gates and majority gates. We also show the design of the whole 1-bit ALU and its preliminary operation results. II. DESIGN OF ARITHMETIC LOGIC UNIT Fig. 1 shows a block diagram of a 1-bit ALU. Our ALU has very simple functions which is composed of AND, OR, NOT, and FA. A multiplexer (MUX) selects the output of the ALU from the AND, OR, or FA output. The 3-input XOR gate is used in the FA in order to reduce the operation time of the FA minimizing the cycle time of a pipeline. A. 3-Input XOR Gate Fig. 2 shows the block diagram of the 3-input XOR gate and its equivalent circuit. It has three data inputs, one clock input, and one data output. The storage loop of the gate is composed of inductors L1A, L1B, L1C, and Josephson junctions J3 and

Fig. 3.

Fig. 2.

J4. This gate operates as follows: when a SFQ is injected from data- or data- , the SFQ is stored in the storage loop. In this case, the gate outputs the SFQ from the storage loop on the timing of a clock pulse which is applied to the clock input. When two or more SFQ’s are simultaneously injected, all SFQ’s are excluded from the loop through J3. We added a Josephson Transmission Line (JTL) acting as an input-delay-circuit one and only on the data-C to prevent that three SFQ’s are injected simultaneously. When three SFQ’s are injected from data- , data- , and data- at the same time, two SFQ’s from dataand data- are excluded through J3, and the SFQ from datais stored in the storage loop. This SFQ is produced from the storage loop as a data output on the clock pulse timing. This function is the 3-input XOR. The advantage of this gate is that it can be operated within one clock period. SUM function in a FA is expressed as follows: (1) and denotes data- , data- , Carry from where , , lower bits, and XOR. The 3-input XOR gate can be used for SUM operation in the FA. B. Full-Adder Fig. 3 shows a block diagram of the FA composed of the 3-input XOR gate and a majority gate. In both circuits, the timing of the input signals must be adjusted. The D-FF gates are added on the inputs of the two gates in order to adjust the data-input-timing. A splitter of 1 to 3 supplies the clock pulse to three D-FF gates at the same timing. Fig. 4 shows the block diagram of the majority gate and its equivalent circuit, which is composed of three data inputs and one data output. When two or more SFQ’s are injected at the same time, the SFQ is produced from the junction J3. When one SFQ is injected, no SFQ is produced from J3, because the

Fig. 4.

SFQ is excluded from the escape junction (J2A, J2B or J2C). The majority gate is used for Carry operation in the FA, which can be expressed as: (2) where the symbols and denoted OR and AND, respectively. The majority gate can operate within one clock period. C. Arithmetic Logic Unit Fig. 5 shows the block diagram of a 1-bit ALU. This ALU operates in a three-stage pipeline. Each pipeline stage operates with one clock cycle. In the first stage, complementary signals



Fig. 6.



of inputs are generated. In the second stage, D-FF gates, AND gate, OR gate and the majority gate perform their logic functions of the ALU. Only the logic operation of the 3-input XOR gate is performed in this stage. Output of the 3-input XOR gate is generated in the next stage. In the third stage, the 3-input XOR gate outputs the SUM of the FA and the MUX selects one output signal from three outputs of the AND gate, the OR gate, or the 3-input XOR gate. The MUX is composed of AND gates and D-FF gates. Table I shows various operation times of component circuits of the ALU. These values were obtained from the simulation in which parameters were optimized for 10 GHz clock operation. An optimization tool in WinS was used in this simulation. Setup time, delay time, and hold time were defined as shown in Fig. 6. The setup time and the delay time indicate the time-interval between the arrival time of clock pulse and that of data pulse at the gate. The hold time indicates the minimum time-interval, which is required between the data pulse and the clock pulse for proper operation. The maximum operation frequency of the gate can be calculated from the sum of the setup time and the delay time of the gate. The maximum operation frequency of the gate limits the maximum clock frequency of the ALU. Our simulation results showed that all components of the ALU can operate up to 50 GHz. The circuit pattern of the designed ALU were laid out using the Design Rules of NEC’s standard Nb process [10], where the critical current density of the Josephson junction was 2500 A/cm . The designed ALU occupied the area of 1200 m 2600 m, which contains 560 JJ’s.

Fig. 7.

III. EXPERIMENTAL RESULTS OF COMPONENT CIRCUITS Table II shows the measured bias margins of the component circuits of the ALU obtained in the low speed measurement. In our measurement, the output signals of the circuits were monitored using the SFQ/latch converter [11]. Fig. 7 shows the input and output waveforms of the full-adder which obtained in the low speed functional test. The properly operated output patterns of Carry and SUM were observed in the figure. Table II shows the measurement results of the element circuits used for the ALU. We confirmed that the bias margin of all cells had large bias margins of more than 37%. We designed the ALU by using these component gates, and experimentally confirmed that the AND, the OR and the Carry operation were performed properly. The measurement to confirm the full operation of the ALU is on-going. IV. SUMMARY We designed a new type of a superconducting 1-bit ALU, which can be operated in 3-stage pipeline, using a 3-input XOR gate. A simulation study showed that all components of the ALU can operate at 50 GHz. The designed ALU occupied the area of 1200 m 2600 m, which contains 560 JJ’s. The components



of the ALU were fabricated and tested at low speed. Large bias margins of more than 37% were obtained.

ACKNOWLEDGMENT The chips were fabricated using NEC’s standard Nb process.

