Torrent Architecture Manual

Krste Asanović
David Johnson

TR-96-056

December 1996

Abstract

This manual contains the specification of the Torrent Instruction Set Architecture (ISA). Torrent is a vector ISA designed for digital signal processing applications. Torrent is based on the 32-bit MIPS-II ISA, and this manual is intended to be read as a supplement to the book “MIPS RISC Architecture” by Kane and Heinrich. Torrent is the ISA of the T0 vector microprocessor which is described in the separate “T0 Engineering Data” technical report.

This work was supported by ONR URI Grant N00014-92-J-1617, ARPA contract number N0001493-C0249, NSF Grant No. MIP-9311980, and NSF PYI Award No. MIP-8958568NSF. Additional support was provided by ICSI.
1 Introduction

Torrent is a vector processor instruction set architecture (ISA) designed for digital signal processing applications. The Torrent ISA allows for a range of implementations performing differing numbers of operations per clock cycle, depending on available technology.

Torrent is based on the 32 bit MIPS-II ISA. The scalar unit is MIPS-compliant, with coprocessor instruction extensions for the vector unit. This manual is intended to be read as a supplement to the book “MIPS RISC Architecture” by Kane and Heinrich.

To date, the only implementation of Torrent is T0. T0 is described in detail in the “T0 Engineering Data” document.
2 CPU

The Torrent CPU runs the MIPS-II standard ISA. Future Torrent implementations may adopt further extensions to the MIPS ISA standard.

The MIPS-II SYNC instruction forces all previous memory operations to complete before allowing further memory operations. SYNC instructions may be required on some Torrent implementations to synchronize memory references between the vector and scalar processors, or between separate vector memory pipelines.
3 System Control Coprocessor (CP0)

The system control coprocessor is the standard location for registers dealing with memory management and exception handling. The contents of CP0 are implementation dependent.
4 Floating Point Coprocessor (CP1)

Torrent employs the standard MIPS floating point coprocessor instruction set. If an implementation does not provide hardware floating point, these instructions cause a trap to software emulation.
5 Vector Unit (CP2)

The Torrent vector unit (VU) is implemented as coprocessor 2 for the MIPS CPU.

Vector registers

Figure 1 shows the registers contained within the vector coprocessor. The VU ISA allows up to 32 vector registers, $v0$–$v31$\(^1\)

The first implementation, T0, provides only 16 vector registers, $v0$–$v15$. All elements of vector register $v0$ are defined to hold the constant zero. Writes to $v0$ are ignored, and reads of $v0$ return 0.

Each vector element is 32b wide. The first implementation, T0, provides 32 elements per vector. Future implementations may provide longer vectors.

Control registers

Five VU control registers are defined in the coprocessor control register space. These are accessed using the standard ctc2/cfc2 instructions.

An implementation will typically include additional VU control registers to handle exceptions. These are implementation dependent and will normally only be accessible from kernel mode.

VU Instructions

The VU extends the MIPS-II instruction set by adding coprocessor instructions that perform vector operations. The VU is a load/store vector-register architecture. Vector instructions are divided into 2 major groups: vector load/store operations, and vector arithmetic operations. Vector load/store operations move vectors between vector registers and memory, while vector arithmetic instructions operate on vectors in registers.

Each vector register holds a vector of 32b values. A single vector instruction specifies a sequence of operations, and may run for many cycles. The maximum length of a vector is limited by the implementation, but shorter vectors can be specified using the vector length register.

Scalar operands for vector-scalar operations are obtained from the CPU general purpose registers.

There are three varieties of vector load/stores: unit stride, arbitrary stride, and indexed. Unit stride load/stores can specify a post-increment for the scalar address register. Arbitrary stride load/stores transfer elements stored at addresses that form an arithmetic progression. Indexed vector load/stores (gather/scatter) allow indirect memory accesses to be vectorized.

Vector memory operations can transfer bytes, halfwords, and words. Bytes and halfwords are optionally sign-extended to 32b when loaded, and the least significant byte or halfword of a vector element is used for a byte or halfword store.

Vector 32b integer and 32b logical operations are defined. In addition, fixed point instructions are defined to support scaled fixed-point arithmetic. These instructions allow the multiple steps required for a scaled, rounded, fixed-point addition or multiplication to be performed within a single vector instruction.

Conditional vector operations are supported with vector conditional move instructions. A set of vector flag instructions allow a vector condition to be represented as a bit vector that can be read into a scalar register for further processing.

\(^1\)The Torrent Architecture extends the MIPS register naming scheme. The vector registers are defined by the assembler as $v0$–$v31$, but are usually referred to by the aliases $v0$–$v31$ in user code.
**Vector Registers**

- vv0
- vv15-31

**Vector Control Registers**

- vcr0: Implementation/revision
- vcr2: Vector Length (vlr)
- vcr4: VU Condition
- vcr8: VU Overflow
- vcr12: VU Saturate

**Figure 1:** Vector unit registers.
5.1 Vector Unit Control Registers

The vector unit control registers are listed in Table 1.

<table>
<thead>
<tr>
<th>Number</th>
<th>Register</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>vcr0</td>
<td>vrev</td>
<td>Implementation/revision</td>
</tr>
<tr>
<td>vcr1</td>
<td>vcount</td>
<td>Free running counter</td>
</tr>
<tr>
<td>vcr2</td>
<td>vlr</td>
<td>Vector length</td>
</tr>
<tr>
<td>vcr4</td>
<td>vcond</td>
<td>Vector condition flags</td>
</tr>
<tr>
<td>vcr8</td>
<td>vovf</td>
<td>Vector overflow flags</td>
</tr>
<tr>
<td>vcr12</td>
<td>vsat</td>
<td>Vector saturation flags</td>
</tr>
</tbody>
</table>

Table 1: Vector unit control registers.

5.1.1 VU Implementation and Revision Number (VCR0)

![Figure 2: VU Implementation and Revision Register Format](image)

The vrev register is a 32b read only register that contains the implementation and revision number of the VU. These values can be used by configuration and diagnostic software.

The vrev register format is shown in Figure 2. Bits 15–8 define the implementation number, and bits 7–0 define the revision number. The implementation number can be used by user software to detect changes in instruction set or performance. The revision number identifies mask revisions of a given implementation.

Implementation field values are given in Table 2.

<table>
<thead>
<tr>
<th>Imp. Number</th>
<th>Vector Unit</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>T0</td>
</tr>
<tr>
<td>1–255</td>
<td>reserved</td>
</tr>
</tbody>
</table>

Table 2: VU Implementation types.
5.1.2 Vector Count Register (VCR1)

The vector count register, `vcount`, is a 32b read only register formatted as shown in Figure 3.

The value in the `vcount` register is guaranteed to increase linearly with time, although the rate of increase is unspecified. When it reaches a maximum value of 0xffffffff, the count register will reset to 0 and continue incrementing. Its purpose is to allow relative comparison of small periods of elapsed time for performance analysis.

Figure 3: Vector Count Register Format

5.1.3 Vector Length Register (VCR2)

The length of a vector operation is specified in an 8b vector length register, `vlr`. If a vector instruction is issued when the value in `vlr` is 0, no operation is performed. If a vector instruction is issued when the value in `vlr` is greater than the implementation’s maximum vector length, a vector operation exception is raised. Implementations provide at least 32 elements per vector.

Reads or writes of the vector length register do not affect vector instructions in progress.
5.1.4 VU Condition Register (VCR4)

\[
\begin{array}{cccccc}
31 & 30 & 2 & 1 & 0 \\
\text{vcond31} & \ldots & \text{vcond1} & \text{vcond0} \\
1 & \ldots & 1 & 1 & 1
\end{array}
\]

Figure 5: Vector Condition Register Format

The VU condition register, \texttt{vcond}, is a 32-bit read/write register formatted as shown in Figure 5.

The \texttt{vcond} register holds the VU condition flags, which reflect the result of the last conditional flag instruction. There is one flag bit corresponding to each vector element, with the least significant bit representing the condition for vector element zero. The register is only altered by conditional flag instructions (set less than, set less than unsigned, set equal) and writes to \texttt{vcond}.

Reads of \texttt{vcond} are interlocked and are guaranteed to return the most recent value. Writes to \texttt{vcond} are not affected by previously issued vector instructions which may still be executing.

Future implementations with greater than 32 elements per vector register may provide additional control registers to hold the extra conditional bits. Future implementations may also add further \texttt{vcond} registers to permit better scheduling of parallel conditional operations.

5.1.5 VU Overflow Register (VCR8)

\[
\begin{array}{cccccc}
31 & 30 & 2 & 1 & 0 \\
\text{ovf31} & \ldots & \text{ovf1} & \text{ovf0} \\
1 & \ldots & 1 & 1 & 1
\end{array}
\]

Figure 6: Vector Overflow Register Format

The VU overflow register, \texttt{ovf}, is a 32-bit read/write register formatted as shown in Figure 6.

The \texttt{ovf} register holds the VU overflow flags. The overflow flags are sticky bits which are set when any vector arithmetic operation causes an overflow. There is one flag bit corresponding to each vector element, with the least significant bit representing the overflow status for vector element zero. The register is only altered by overflowing arithmetic operations (add, sub) and writes to \texttt{ovf}.

Reads of \texttt{ovf} are interlocked and are guaranteed to return the most recent value. Writes to \texttt{ovf} are not affected by previously issued vector instructions which may still be executing.

Implementations that have greater than 32 elements per vector register have additional overflow registers to hold the extra overflow bits.
5.1.6 VU Saturate Register (VCR12)

<table>
<thead>
<tr>
<th>31</th>
<th>30</th>
<th>2</th>
<th>1</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>vsat31</td>
<td>...</td>
<td>vsat1</td>
<td>vsat0</td>
<td></td>
</tr>
</tbody>
</table>

1 ... 1 1

Figure 7: Vector Saturate Register Format

The VU saturate register, `vsat`, is a 32 bit read/write register formatted as shown in Figure 7.

The `vsat` register holds the VU saturate flags. The saturate flags are sticky bits which are set when any vector arithmetic operation causes a saturated result value. There is one flag bit corresponding to each vector element, with the least significant bit representing the saturation status for vector element zero. The register is only altered by vector fixed-point operations and writes to `vsat`.

Reads of `vsat` are interlocked and are guaranteed to return the most recent value. Writes to `vsat` are not affected by previously issued vector instructions which may still be executing.

Implementations that have greater than 32 elements per vector register have additional saturation registers to hold the extra saturation bits.
5.2 Instruction Overview

VU Instruction Classes

Instructions affecting the vector unit are divided into several classes:

- **Control Register** instructions that read and write VU coprocessor control registers.
- **Move** instructions that move data between vector registers, and between the CPU general registers and the vector registers.
- **Load/Store** instructions that move vectors of data between vector registers and memory.
- **Integer Arithmetic** instructions that provide integer arithmetic, shift, logical, compare and conditional operations on vector register contents.
- **Fixed-point** instructions that provide scaled and rounded fixed point arithmetic operations on vector register contents.

VU Instruction Formats

The VU control register read/write instructions use the standard MIPS coprocessor instruction encodings.

The move, load/store, integer, and fixed-point arithmetic instructions are encoded using the standard MIPS coprocessor operate instruction. These all use the base instruction format shown in Figure 8

The **format** field is encoded as shown in Table 3. The **format** field encodes the class of instruction and also the operand sources, either vector-vector or vector-scalar.

The **opers** field defines the order of operands for non-commutative operations. If **opers** is one, the operands are vector/vector or vector/scalar. If **opers** is zero, the

<table>
<thead>
<tr>
<th>Format</th>
<th>Operation type</th>
<th>Operands</th>
</tr>
</thead>
<tbody>
<tr>
<td>10000</td>
<td>Insert/extract</td>
<td>Vector-Vector</td>
</tr>
<tr>
<td>10001</td>
<td>Insert/extract</td>
<td>Vector-Scalar</td>
</tr>
<tr>
<td>10010</td>
<td>Memory</td>
<td>Vector-Vector</td>
</tr>
<tr>
<td>10011</td>
<td>Memory</td>
<td>Vector-Scalar</td>
</tr>
<tr>
<td>10100</td>
<td>Integer/logical</td>
<td>Vector-Vector</td>
</tr>
<tr>
<td>10101</td>
<td>Integer/logical</td>
<td>Vector-Scalar</td>
</tr>
<tr>
<td>10110</td>
<td>reserved</td>
<td></td>
</tr>
<tr>
<td>10111</td>
<td>reserved</td>
<td></td>
</tr>
<tr>
<td>11000</td>
<td>Add fixed</td>
<td>Vector-Vector</td>
</tr>
<tr>
<td>11001</td>
<td>Add fixed</td>
<td>Vector-Scalar</td>
</tr>
<tr>
<td>11010</td>
<td>Sub fixed</td>
<td>Vector-Vector</td>
</tr>
<tr>
<td>11011</td>
<td>Sub fixed</td>
<td>Vector-Scalar</td>
</tr>
<tr>
<td>11100</td>
<td>Multiply fixed</td>
<td>Vector-Vector</td>
</tr>
<tr>
<td>11101</td>
<td>Multiply fixed</td>
<td>Vector-Scalar</td>
</tr>
<tr>
<td>11110</td>
<td>reserved</td>
<td></td>
</tr>
<tr>
<td>11111</td>
<td>reserved</td>
<td></td>
</tr>
</tbody>
</table>

Figure 8: Vector unit base instruction format.
5.3 VU Control Register Instructions

The vector control register instructions move values between the scalar CPU registers and the vector control registers. These operations use the standard MIPS coprocessor control register operations.

These operations are unpredictable if the coprocessor control register field is not one of the valid coprocessor control register numbers as listed in Table 1.

CFVU Move Control Word From VU

| COP2 | CF  | rt | cs     | 0
|------|-----|----|--------|-----
| 010010 | 00010 | 5  | 5      | 0000000000

Format:

CFVU rt, cs

Description:

The contents of vector unit control register cs are copied into scalar register rt.

This operation is only defined when cs is a valid coprocessor control register.

Operation:

r[rt] = vcr[cs];

Exceptions:

Coprocessor unusable exception.
CTVU Move Control Word To VU

<table>
<thead>
<tr>
<th>COP2</th>
<th>CT</th>
<th>rt</th>
<th>cs</th>
<th>0000000000</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>00110</td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>11</td>
</tr>
</tbody>
</table>

**Format:**
CTVU rt, cs

**Description:**
The contents of scalar register rt are copied into vector unit control register cs.

This operation is only defined when cs is a valid coprocessor control register.

**Operation:**
\[ \text{vcr}[cs] = r[rt]; \]

**Exceptions:**
Coprocessor unusable exception.
5.4 Vector Insert/Extract Instructions

The insert and extract instructions are used to form vectors from scalars, or to break vectors down into scalars or smaller vectors.

The extract vector instruction transfers elements from the end of one vector register to the start of another. A scalar register gives a start index into the source register. This instruction can be used to speed reduction operations.

The scalar insert/extract instructions transfer an element between the scalar register file and a vector register. A scalar register gives the index.

The insert/extract instruction encoding is shown in Table 4.

<table>
<thead>
<tr>
<th>Format</th>
<th>vw</th>
<th>Opers</th>
<th>Mnemonic</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>10000</td>
<td>01011</td>
<td>0</td>
<td>vext.v</td>
<td>Extract from vector into vector.</td>
</tr>
<tr>
<td>10001</td>
<td>10011</td>
<td>0</td>
<td>vins.s</td>
<td>Insert into vector from scalar.</td>
</tr>
<tr>
<td>10001</td>
<td>11011</td>
<td>0</td>
<td>vext.s</td>
<td>Extract from vector into scalar.</td>
</tr>
</tbody>
</table>

Table 4: Insert/extract instruction encoding.
**VINS.S**  Insert Into Vector From Scalar

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>IXS</td>
<td>rt</td>
<td>rd</td>
<td>VS</td>
<td>INSS</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>10010</td>
<td>10001</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

vins.s rt, vd, rd

**Description:**

The value of scalar register rt is copied to element rd of vector register vd.

If the lower eight bits of scalar register rd are greater than or equal to the maximum vector length, a vector operation exception is raised.

**Operation:**

\[
v[vd][r[rd]] = r[rt];
\]

**Exceptions:**

Reserved instruction exception.
CoProcessor unusable exception.
Vector operation exception.

---

**VEXT.S**  Extract From Vector To Scalar

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>IXS</td>
<td>rt</td>
<td>rd</td>
<td>VS</td>
<td>EXTS</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>10010</td>
<td>10001</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

vext.s rt, vd, rd

**Description:**

The value of element rd of vector register vd is copied to scalar register rt.

If the lower 8 bits of scalar register rd are greater than or equal to the maximum vector length, a vector operation exception is raised.

**Operation:**

\[
r[rt] = v[vd][r[rd]];\]

**Exceptions:**

Reserved instruction exception.
CoProcessor unusable exception.
Vector operation exception.
5.5 Vector Load/Store Instructions

Vector loads and stores transfer bytes, halfwords, and words between vector register elements and memory. Bytes and halfwords are sign-extended when loaded into vector elements. Stores always transfer the least significant bits of an element to memory. Addresses for the memory transfers are taken from the scalar registers.

There are three classes of vector loads and stores: unit-stride, arbitrary stride, and vector indexed. The unit-stride operations transfer vectors whose elements are held in contiguous locations in memory. The unit-stride operations allow a post-increment of the base address register. The arbitrary stride operations transfer vectors to or from memory at addresses that form an arithmetic progression. The vector indexed operations transfer vectors whose elements are located at offsets from a base address, with the offsets specified by the contents of an index vector.

Table 5 shows the encoding for vector load/store operations.
<table>
<thead>
<tr>
<th>Format</th>
<th>\textit{vw}</th>
<th>Opers</th>
<th>Mnemonic</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>10010</td>
<td>0xxxx</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Load signed byte vector indexed.</td>
</tr>
<tr>
<td>10010</td>
<td>10000</td>
<td>0</td>
<td>lbx.v</td>
<td>Load signed byte vector indexed.</td>
</tr>
<tr>
<td>10010</td>
<td>10001</td>
<td>0</td>
<td>lhx.v</td>
<td>Load signed halfword vector indexed.</td>
</tr>
<tr>
<td>10010</td>
<td>10010</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Load word vector indexed.</td>
</tr>
<tr>
<td>10010</td>
<td>10011</td>
<td>0</td>
<td>lwx.v</td>
<td>Load word vector indexed.</td>
</tr>
<tr>
<td>10010</td>
<td>10100</td>
<td>0</td>
<td>lbux.v</td>
<td>Load unsigned byte vector indexed.</td>
</tr>
<tr>
<td>10010</td>
<td>10101</td>
<td>0</td>
<td>lhux.v</td>
<td>Load unsigned halfword vector indexed.</td>
</tr>
<tr>
<td>10010</td>
<td>10110</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Store byte vector indexed.</td>
</tr>
<tr>
<td>10010</td>
<td>10110</td>
<td>0</td>
<td>sbx.v</td>
<td>Store byte vector indexed.</td>
</tr>
<tr>
<td>10010</td>
<td>10111</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Store word vector indexed.</td>
</tr>
<tr>
<td>10010</td>
<td>11000</td>
<td>0</td>
<td>swx.v</td>
<td>Store word vector indexed.</td>
</tr>
<tr>
<td>10010</td>
<td>11001</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Load unit-stride signed byte vector with auto-increment.</td>
</tr>
<tr>
<td>10011</td>
<td>00000</td>
<td>0</td>
<td>lbai.v</td>
<td>Load unit-stride signed byte vector with auto-increment.</td>
</tr>
<tr>
<td>10011</td>
<td>00001</td>
<td>0</td>
<td>lhai.v</td>
<td>Load unit-stride signed halfword vector with auto-increment.</td>
</tr>
<tr>
<td>10011</td>
<td>00010</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Load unit-stride word vector with auto-increment.</td>
</tr>
<tr>
<td>10011</td>
<td>00100</td>
<td>0</td>
<td>lwai.v</td>
<td>Load unit-stride unsigned byte vector with auto-increment.</td>
</tr>
<tr>
<td>10011</td>
<td>00101</td>
<td>0</td>
<td>luai.v</td>
<td>Load unit-stride unsigned halfword vector with auto-increment.</td>
</tr>
<tr>
<td>10011</td>
<td>00110</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Store unit-stride byte vector with auto-increment.</td>
</tr>
<tr>
<td>10011</td>
<td>01000</td>
<td>0</td>
<td>sbai.v</td>
<td>Store unit-stride halfword vector with auto-increment.</td>
</tr>
<tr>
<td>10011</td>
<td>01001</td>
<td>0</td>
<td>shai.v</td>
<td>Store unit-stride halfword vector with auto-increment.</td>
</tr>
<tr>
<td>10011</td>
<td>01010</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Store unit-stride word vector with auto-increment.</td>
</tr>
<tr>
<td>10011</td>
<td>01011</td>
<td>0</td>
<td>swai.v</td>
<td>Store unit-stride word vector with auto-increment.</td>
</tr>
<tr>
<td>10011</td>
<td>01100</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Load signed byte vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>01101</td>
<td>0</td>
<td>lhst.v</td>
<td>Load signed halfword vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>01110</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Load word vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>01111</td>
<td>0</td>
<td>lwst.v</td>
<td>Load word vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>10000</td>
<td>0</td>
<td>lbust.v</td>
<td>Load unsigned byte vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>10001</td>
<td>0</td>
<td>lhust.v</td>
<td>Load unsigned halfword vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>10100</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Store byte vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>10101</td>
<td>0</td>
<td>shst.v</td>
<td>Store halfword vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>11000</td>
<td>0</td>
<td>sbst.v</td>
<td>Store word vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>11001</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Store word vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>11100</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Store word vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>11101</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Store word vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>11110</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Store word vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>11111</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Store word vector with stride.</td>
</tr>
<tr>
<td>10011</td>
<td>111xx</td>
<td>0</td>
<td>\textit{reserved}</td>
<td>Store word vector with stride.</td>
</tr>
<tr>
<td>1001x</td>
<td>xxxxx</td>
<td>1</td>
<td>\textit{reserved}</td>
<td>Store word vector with stride.</td>
</tr>
</tbody>
</table>

Table 5: Vector load/store instruction encoding.
**LxAI.V Load Auto-Increment Vector**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>MEMS</td>
<td>rt</td>
<td>rd</td>
<td>VS</td>
<td>LBAI</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010010</td>
<td>10011</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>0</td>
<td>0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Exceptions:**
- Reserved instruction exception.
- Coprocessor unusable exception.
- Vector operation exception.
- Vector address exception.

**Format:**
- lbai.v vd, rd, rt
- lhai.v vd, rd, rt
- lwai.v vd, rd, rt

**Description:**
The `vlr` register is read to give, n, the number of elements to be loaded. Starting from the base address in scalar register `rd`, n contiguous elements are loaded from memory, sign-extended to 32b (if necessary), and placed in the first n consecutive elements of the vector register `vd`. The value of `rd` is post-incremented by the value of `rt`. This post-increment is treated as unsigned addition and does not generate an overflow. The result of the instruction is undefined if `rd` is the same as `rt`.

A vector address exception occurs if the instruction loads halfwords and the least significant bit of the `rd` register is non-zero. A vector address exception occurs if the instruction loads words and either of the two least significant bits of the `rd` register are non-zero. A vector operation exception is raised if `vlr` is larger than the implementation's maximum vector length.

**Operation:**
```c
for (i=0; i<vlr; i++)
    v[vd][i] = extend(m[r[rd]+i*elsize]);
r[rd] += r[rt];
```
LxUAI.V Load Unsigned Auto-Increment Vector

| COP2   | MEMS | rt | rd | VS | LBUAI | vd | 5 | 5 | 1 | 1 | 31  | 26  | 25  | 21  | 19  | 16  | 11  | 10  | 9   | 5   | 4   | 0   |
|--------|------|----|----|----|------|----|---|---|---|---|----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|-----|
| 010010 | 10011| 6  | 5  | 5  | 5    | 5  | 5 | 5 | 1 | 1 | 31  | 26  | 25  | 21  | 19  | 16  | 11  | 10  | 9   | 5   | 4   | 0   |

Format:

lbuai.v vd, rd, rt
lhuai.v vd, rd, rt

Description:

The \( vlr \) register is read to give, \( n \), the number of elements to be loaded. Starting from the base address in scalar register \( rd \), \( n \) contiguous elements are loaded from memory, zero extended to 32b, and placed in the first \( n \) consecutive elements of the vector register \( vd \). The value of \( rd \) is post-incremented by the value of \( rt \). This post-increment is treated as unsigned addition and does not generate an overflow. The result of the instruction is undefined if \( rd \) is the same as \( rt \).

A vector address exception occurs if the instruction loads halfwords and the least significant bit of the \( rd \) register is non-zero. A vector operation exception is raised if \( vlr \) is larger than the implementation’s maximum vector length.

Operation:

\[
\text{for } (i=0; i<vlr; i++)
\]

\[
v[vd][i] = m[r[rd]+i*elsize];
\]

\[
r[rd] += r[rt];
\]

Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
Vector address exception.

LxX.V Load Indexed Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>MEMV</th>
<th>vi</th>
<th>rd</th>
<th>VS</th>
<th>LBX</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10010</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>MEMV</th>
<th>vi</th>
<th>rd</th>
<th>VS</th>
<th>LBX</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10010</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
</tbody>
</table>

Format:

lbx.v vd, rd, vt
lhx.v vd, rd, vt
lwx.v vd, rd, vt

Description:

This is a gather operation. The \( vlr \) register is read to give, \( n \), the number of elements to be loaded. The scalar register \( rd \) is read to give the base address. The first \( n \) elements of \( vt \) are then added to \( rd \) to give \( n \) effective addresses. The vector of effective addresses is used to load \( n \) elements from memory which are then sign-extended to 32b, and placed into the first \( n \) elements of \( vd \).

A vector address exception occurs if the instruction loads halfwords and the least significant bit of any effective address is zero. A vector address exception occurs if the instruction loads words and either of the two least significant bits of any effective address are non-zero. A vector operation exception is raised if \( vlr \) is larger than the implementation’s maximum vector length.

Operation:

\[
\text{for } (i=0; i<vlr; i++)
\]

\[
v[vd][i] = \text{extend}(m[r[rd]+v[vt][i]]);
\]

Exceptions:
Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
Vector address exception.

**LxUX.V Load Unsigned Indexed Vector**

<table>
<thead>
<tr>
<th>COP2</th>
<th>MEMV</th>
<th>vt</th>
<th>rd</th>
<th>VS</th>
<th>LBUX</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10010</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>MEMV</th>
<th>vt</th>
<th>rd</th>
<th>VS</th>
<th>LHUX</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10010</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**

<table>
<thead>
<tr>
<th>lbux.v vd, rd, vt</th>
</tr>
</thead>
<tbody>
<tr>
<td>lhux.v vd, rd, vt</td>
</tr>
</tbody>
</table>

**Description:**

This is a gather operation. The `vlr` register is read to give, \( n \), the number of elements to be loaded. The scalar register `rd` is read to give the base address. The first \( n \) elements of `vt` are then added to `rd` to give \( n \) effective addresses. The vector of effective addresses is used to load \( n \) elements from memory which are zero extended to 32b, and placed into the first \( n \) elements of `vd`.

A vector address exception occurs if the instruction loads halfwords and the least significant bit of any effective address is zero. A vector operation exception is raised if `vlr` is larger than the implementation's maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
    v[vd][i] = m[r[rd]+v[vt][i]];
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
Vector address exception.
LxST.V  Load Strided Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>MEMS</th>
<th>rt</th>
<th>rd</th>
<th>VS</th>
<th>LBST</th>
<th>v0d</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10011</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>MEMS</th>
<th>rt</th>
<th>rd</th>
<th>VS</th>
<th>LHST</th>
<th>v0d</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10011</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>MEMS</th>
<th>rt</th>
<th>rd</th>
<th>VS</th>
<th>LWST</th>
<th>v0d</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10011</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

Exceptions:
- Reserved instruction exception.
- Coprocessor unusable exception.
- Vector operation exception.
- Vector address exception.

Format:
lbst.v v0d, rd, rt
lhst.v v0d, rd, rt
lwst.v v0d, rd, rt

Description:
The \texttt{vlr} register is read to give, \( n \), the number of elements to be loaded. The scalar register \( rt \) is read to give the byte stride of the accesses. The first operand is loaded from the address given in \( rd \), sign-extended to the vector element width (if necessary), and placed in the first element of \( v0d \). The \( k \)th element of \( v0d \) is loaded from address

\[ rd + rt \times (k - 1) \]

A vector address exception occurs if the instruction loads halfwords and the least significant bit of any effective address is non-zero. A vector address exception occurs if the instruction loads words and either of the two least significant bits of any effective address is non-zero. A vector operation exception is raised if \( \texttt{vlr} \) is larger than the implementation's maximum vector length.

Operation:

\[
\text{for (i=0; i<\texttt{vlr}; i++)}
\]
\[
v[\text{vd}][i] = \text{extend(m[rd]+r[rt]*i});
\]
LxUST.V  Load Unsigned Strided Vector

\[
\begin{array}{|cccccccc|}
\hline
31 & 26 & 25 & 21 & 20 & 16 & 15 & 11 & 10 & 9 & 5 & 4 & 0 \\
\hline
\hline
\text{COP2} & \text{MEMS} & \text{rd} & \text{rd} & \text{VS} & 0 & \text{LHUST} & 10100 & \text{vd} \\
010010 & 10011 & 5 & 5 & 1 & 5 & 5 & & \\
\hline
\end{array}
\]

\[
\begin{array}{|cccccccc|}
\hline
31 & 26 & 25 & 21 & 20 & 16 & 15 & 11 & 10 & 9 & 5 & 4 & 0 \\
\hline
\hline
\text{COP2} & \text{MEMS} & \text{rd} & \text{rd} & \text{VS} & 0 & \text{LHUST} & 10101 & \text{vd} \\
010010 & 10011 & 5 & 5 & 1 & 5 & 5 & & \\
\hline
\end{array}
\]

Format:
\lbrack\text{lhust}\.\text{vd}, \text{rd}, \text{rt}\rbrack
\lbrack\text{lhust}\.\text{vd}, \text{rd}, \text{rt}\rbrack

Description:
The \text{vlr} register is read to give \( n \), the number of elements to be loaded. The scalar register \( rt \) is read to give the byte stride of the accesses. The first operand is loaded from the address given in \( rd \), zero extended to the vector element width (if necessary), and placed in the first element of \( vd \). The \( k \)th element of \( vd \) is loaded from address
\[
rd + rt \times (k - 1)
\]

A vector address error exception occurs if the instruction loads halfwords and the least significant bit of any effective address is non-zero. A vector operation exception is raised if \text{vlr} is larger than the implementation’s maximum vector length.

Operation:
\begin{verbatim}
for (i=0; i<\text{vlr}; i++)
  v[\text{vd}][i] = m[\text{rd}]*r[\text{rt}]*i;
\end{verbatim}

Exceptions:
Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
Vector address exception.

SxAI.V  Store Auto-Increment Vector

\[
\begin{array}{|cccccccc|}
\hline
31 & 26 & 25 & 21 & 20 & 16 & 15 & 11 & 10 & 9 & 5 & 4 & 0 \\
\hline
\hline
\text{COP2} & \text{MEMS} & \text{rd} & \text{rd} & \text{VS} & 0 & \text{SHAI} & 01001 & \text{vd} \\
010010 & 10011 & 5 & 5 & 1 & 5 & 5 & & \\
\hline
\end{array}
\]

\[
\begin{array}{|cccccccc|}
\hline
31 & 26 & 25 & 21 & 20 & 16 & 15 & 11 & 10 & 9 & 5 & 4 & 0 \\
\hline
\hline
\text{COP2} & \text{MEMS} & \text{rd} & \text{rd} & \text{VS} & 0 & \text{SHAI} & 01001 & \text{vd} \\
010010 & 10011 & 5 & 5 & 1 & 5 & 5 & & \\
\hline
\end{array}
\]

Format:
\lbrack\text{sbai}\.\text{vd}, \text{rd}, \text{rt}\rbrack
\lbrack\text{shai}\.\text{vd}, \text{rd}, \text{rt}\rbrack
\lbrack\text{swai}\.\text{vd}, \text{rd}, \text{rt}\rbrack

Description:
The \text{vlr} register is read to give \( n \), the number of elements to be stored. The first \( n \) consecutive elements of the vector register \( vd \) are stored in consecutive memory locations starting at the base address in scalar register \( rd \). The \( rd \) register is post-incremented by the contents of the \( rt \) register. This post-increment is treated as unsigned addition and does not generate an overflow. The result of the instruction is undefined if \( rd \) is the same as \( rt \). Implementations do not guarantee the order in which vector elements are written within a single vector store instruction.

A vector address exception occurs if the instruction stores halfwords and the least significant bit of the \( rd \) register is non-zero. A vector address exception occurs if the instruction stores words and either of the two least significant bits of the \( rd \) register is non-zero. A vector operation exception is raised if \text{vlr} is larger than the implementation’s maximum vector length.

Operation:
for (i=0; i<vlr; i++)
    m[r[rd]+i*elsize] = v[vd][i];
r[rd] += r[rt];

Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
Vector address exception.

<table>
<thead>
<tr>
<th>SxX.V</th>
<th>Store Indexed Vector</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>COP2 010010 MEMV 10010 vt rd VS SBX 11000 vd</td>
</tr>
<tr>
<td></td>
<td>31 26 25 21 20 16 15 11 10 9 5 4 0</td>
</tr>
<tr>
<td></td>
<td>6 5 5 5 1 5 5</td>
</tr>
</tbody>
</table>

Format:

sbx.v vd, rd, vt
shx.v vd, rd, vt
swx.v vd, rd, vt

Description:

This is a scatter operation. The vlx register is read to give, n, the number of elements to be stored. The scalar register rd is read to give the base address. The first n elements of vt are then added to rd to give n effective addresses. The first n elements of vd are written to memory using the vector of effective addresses. Implementations do not guarantee the order in which individual vector elements are written within a single vector store instruction.

A vector address exception occurs if the instruction stores halfwords and the least significant bit any effective address is non-zero. A vector address exception occurs if the instruction stores words and either of the two least significant bits of any effective address is non-zero. A vector operation exception is raised if vlx is larger than the implementation’s maximum vector length.

Operation:

for (i=0; i<vlr; i++)
    m[r[rd]+v[vt][i]] = v[vd][i];
Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
Vector address exception.

<table>
<thead>
<tr>
<th>COP2</th>
<th>MEMS</th>
<th>rt</th>
<th>rd</th>
<th>VS</th>
<th>SBST</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10011</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>0</td>
<td>11000</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>MEMS</th>
<th>rt</th>
<th>rd</th>
<th>VS</th>
<th>SHST</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10011</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>0</td>
<td>11001</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>MEMS</th>
<th>rt</th>
<th>rd</th>
<th>VS</th>
<th>SWST</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10011</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>0</td>
<td>11011</td>
</tr>
</tbody>
</table>

Format:

sbst.v vd, rd, rt
shst.v vd, rd, rt
swst.v vd, rd, rt

Description:

The vlr register is read to give, \( n \), the number of elements to be stored. The scalar register \( rt \) is read to give the byte stride of the accesses. The first element of \( vd \) is stored to the address given in the scalar register \( rd \). The \( k \)th element of \( vd \) is stored at address

\[
rd + rt \times (k - 1)
\]

Implementations do not guarantee the order in which vector elements are written within a single vector store instruction.

A vector address exception occurs if the instruction stores halfwords and the least significant bit of any effective address is non-zero. A vector address exception occurs if the instruction stores words and either of the two least significant bits of any effective address is non-zero. A vector operation exception is raised if vlr is larger than the implementation’s maximum vector length.

Operation:

\[
\text{for } (i=0; i<\text{vlr}; i++)
\text{ } \text{ } m[r[rd]+r[rt]*i] = v[vd][i];
\]
Exceptions:

Reserved instruction exception.
Coproces sor unusable exception.
Vector operation exception.
Vector address exception.
5.6 Vector Integer ALU Operations

All vector integer and logical computational instructions are available in both vector-vector and vector-scalar forms. The rd field is used to encode the integer function.

The vector-vector instructions specify a vector of binary operations, with the first operand taken from vector register vd, the second operand from vector register vt, and the result placed in vector register vs. Vector-scalar instructions specify a vector of binary operations, with the first operand taken from vector register vd, the second operand from scalar register rt, and the result placed in vector register vs.

A few non-commutative operations have scalar-vector forms where the first operand is rt and the second operand is vs.

Table 6 shows the encoding of the available integer computational operations.

Vector arithmetic and logical instructions

The vector unit implements numerous arithmetic and logical operations, including variable displacement shifts. Signed and unsigned addition and subtraction are provided — the fixed point instructions (see Section 5.7) can be used to perform signed and unsigned 16 bit multiplication.

An overflow status register, vovf, is updated by integer arithmetic operations and can be accessed in coprocessor 2. One “sticky” overflow bit is provided for each vector element. Overflowing operations set these bits, but they can only be cleared by explicit writes to the overflow register.

Vector conditional instructions

To compare vectors, there are conditional set instructions “set less than” and “set equal”. These instructions produce a vector of boolean results, and in the set less than case the comparison can be signed or unsigned. Both are available in vector-vector and vector-scalar forms, with set less than also having a scalar-vector form.

To compare vectors and produce results for manipulation in the scalar unit, there are “flag less than” and “flag equal” instructions. The vector unit maintains a condition flag register, with one condition bit for each vector register element. This register is updated by the flag instructions, and is accessed as a coprocessor 2 control register. To perform conditional branches on vector operations, this register is copied to a scalar CPU register where any MIPS-II conditional branch can be used. Implementations with greater than 32 elements per vector register provide additional condition flag registers.

The vector unit implements vector conditional moves, in both vector-vector and vector-scalar forms. The first operand is compared against zero; if the comparison succeeds, the destination element is updated with the second operand, otherwise the destination element is unaffected. Note that a scalar-vector form is not required, as a scalar condition can always be replaced with a branch in the CPU.

The conditional moves can be made unconditional if the condition vector register is $vzr0$. In this way, scalar register values can be broadcast into a vector register, and vectors of values can be copied between vector registers.
<table>
<thead>
<tr>
<th>Format</th>
<th>rd</th>
<th>Opers</th>
<th>Mnemonic</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>1010v</td>
<td>0000</td>
<td>w</td>
<td>sub.yy</td>
<td>Subtract signed (flag overflow).</td>
</tr>
<tr>
<td>1010v</td>
<td>0001</td>
<td>w</td>
<td>subu.yy</td>
<td>Subtract unsigned (no overflow).</td>
</tr>
<tr>
<td>1010x</td>
<td>0010</td>
<td>x</td>
<td></td>
<td>reserved</td>
</tr>
<tr>
<td>1010x</td>
<td>0011</td>
<td>x</td>
<td></td>
<td>reserved</td>
</tr>
<tr>
<td>1010v</td>
<td>00100</td>
<td>w</td>
<td>flt.yy</td>
<td>Flag less than (update condition).</td>
</tr>
<tr>
<td>1010v</td>
<td>00101</td>
<td>w</td>
<td>fltu.yy</td>
<td>Flag less than unsigned (update condition).</td>
</tr>
<tr>
<td>1010v</td>
<td>00110</td>
<td>0</td>
<td>feq.yy</td>
<td>Flag equal (update condition).</td>
</tr>
<tr>
<td>1010x</td>
<td>00111</td>
<td>x</td>
<td></td>
<td>reserved</td>
</tr>
<tr>
<td>1010v</td>
<td>01000</td>
<td>w</td>
<td>sll. yy</td>
<td>Shift left logical variable.</td>
</tr>
<tr>
<td>1010v</td>
<td>01001</td>
<td>w</td>
<td>srl.v.yy</td>
<td>Shift right logical variable.</td>
</tr>
<tr>
<td>1010x</td>
<td>01010</td>
<td>x</td>
<td></td>
<td>reserved</td>
</tr>
<tr>
<td>1010v</td>
<td>01011</td>
<td>w</td>
<td>srav.yy</td>
<td>Shift right arithmetic variable.</td>
</tr>
<tr>
<td>1010v</td>
<td>01100</td>
<td>w</td>
<td>slt.yy</td>
<td>Set less than.</td>
</tr>
<tr>
<td>1010v</td>
<td>01101</td>
<td>w</td>
<td>situ.yy</td>
<td>Set less than unsigned.</td>
</tr>
<tr>
<td>1010v</td>
<td>01110</td>
<td>0</td>
<td>seq.yy</td>
<td>Set equal.</td>
</tr>
<tr>
<td>1010x</td>
<td>01111</td>
<td>x</td>
<td></td>
<td>reserved</td>
</tr>
<tr>
<td>1010v</td>
<td>10000</td>
<td>0</td>
<td>add.yy</td>
<td>Add signed (flag overflow).</td>
</tr>
<tr>
<td>1010v</td>
<td>10001</td>
<td>0</td>
<td>addu.yy</td>
<td>Add unsigned (no overflow).</td>
</tr>
<tr>
<td>1010x</td>
<td>10010</td>
<td>x</td>
<td></td>
<td>reserved</td>
</tr>
<tr>
<td>1010x</td>
<td>10011</td>
<td>x</td>
<td></td>
<td>reserved</td>
</tr>
<tr>
<td>1010v</td>
<td>10100</td>
<td>0</td>
<td>and.yy</td>
<td>Bitwise logical AND.</td>
</tr>
<tr>
<td>1010v</td>
<td>10101</td>
<td>0</td>
<td>or.yy</td>
<td>Bitwise logical OR.</td>
</tr>
<tr>
<td>1010v</td>
<td>10110</td>
<td>0</td>
<td>xor.yy</td>
<td>Bitwise logical XOR.</td>
</tr>
<tr>
<td>1010v</td>
<td>10111</td>
<td>0</td>
<td>nor.yy</td>
<td>Bitwise logical NOR.</td>
</tr>
<tr>
<td>1010x</td>
<td>11000</td>
<td>x</td>
<td></td>
<td>reserved</td>
</tr>
<tr>
<td>1010v</td>
<td>11001</td>
<td>0</td>
<td>cmnveq. yy</td>
<td>Conditional move not equal zero.</td>
</tr>
<tr>
<td>1010v</td>
<td>11010</td>
<td>0</td>
<td>cmnvgeq. yy</td>
<td>Conditional move greater than or equal zero.</td>
</tr>
<tr>
<td>1010v</td>
<td>11011</td>
<td>0</td>
<td>cmnviez. yy</td>
<td>Conditional move less than or equal zero.</td>
</tr>
<tr>
<td>1010x</td>
<td>11100</td>
<td>x</td>
<td></td>
<td>reserved</td>
</tr>
<tr>
<td>1010v</td>
<td>11101</td>
<td>0</td>
<td>cmvgeq. yy</td>
<td>Conditional move equal zero.</td>
</tr>
<tr>
<td>1010v</td>
<td>11110</td>
<td>0</td>
<td>cmvltz. yy</td>
<td>Conditional move less than zero.</td>
</tr>
<tr>
<td>1010v</td>
<td>11111</td>
<td>0</td>
<td>cmvgtz. yy</td>
<td>Conditional move greater than zero.</td>
</tr>
</tbody>
</table>

Note:
When format is 11110 and opers is 0, yy is “vv”
When format is 11111 and opers is 0, yy is “vs”
When format is 11111 and opers is 1, yy is “sv”
All other encodings are reserved.

Table 6: Field encoding for integer register-register instructions.
AND.VV  And Vector-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVV</th>
<th>vt</th>
<th>AND</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>0000</td>
<td>0101</td>
<td>01</td>
<td>0100</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**

and.vv vw, vd, vt

**Description:**

The vector register `v1r` is read to give `n` the number of elements to compute. The first `n` elements of vector register `vt` are combined with first `n` elements of vector register `vd` in a bitwise logical AND operation. The results are placed in the first `n` elements of vector register `vw`.

A vector operation exception is raised if `v1r` is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<v1r; i++)
    v[vw][i] = v[vd][i] and v[vt][i];
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

---

AND.VS  And Vector-Scalar

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>AND</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>0000</td>
<td>0010</td>
<td>0</td>
<td>0100</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**

and.vs vw, vd, rt

**Description:**

The vector register `v1r` is read to give `n` the number of elements to compute. The first `n` elements of vector register `vd` are combined with scalar register `rt` in a bitwise logical AND operation. The results are placed in the first `n` elements of vector register `vw`.

A vector operation exception is raised if `v1r` is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<v1r; i++)
    v[vw][i] = v[vd][i] and r[rt];
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
### OR.VV

<table>
<thead>
<tr>
<th>COP2</th>
<th>IV V</th>
<th>Vt</th>
<th>OR V</th>
<th>VS</th>
<th>Vw</th>
<th>Vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10100</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

`or.vv vw, vd, vt`

**Description:**

The vector register `vlr` is read to give `n` the number of elements to compute. The first `n` elements of vector register `vd` are combined with first `n` elements of vector register `vt` in a bitwise logical OR operation. The results are placed in the first `n` elements of vector register `vw`.

A vector operation exception is raised if `vlr` is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
    v[vw][i] = v[vd][i] or v[vt][i];
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

### OR.VS

<table>
<thead>
<tr>
<th>COP2</th>
<th>IV S</th>
<th>Rt</th>
<th>OR V</th>
<th>VS</th>
<th>Vw</th>
<th>Vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

`or.vs vw, vd, rt`

**Description:**

The vector register `vlr` is read to give `n` the number of elements to compute. The first `n` elements of vector register `vd` are combined with scalar register `rt` in a bitwise logical OR operation. The results are placed in the first `n` elements of vector register `vw`.

A vector operation exception is raised if `vlr` is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
    v[vw][i] = v[vd][i] or r[rt];
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
**XOR.VV** | **Xor Vector-Vector**
---|---
| COP2 | IV V | vt | XOR V | VS | vw | vd |
| 010010 | 10100 | 5 | 5 | 1 | 5 | 5 |

**Format:**
xor.vv vw, vd, vt

**Description:**
The vector register \( vlr \) is read to give \( n \) the number of elements to compute. The first \( n \) elements of vector register \( vd \) are combined with first \( n \) elements of vector register \( vt \) in a bitwise logical XOR operation. The results are placed in the first \( n \) elements of vector register \( vw \).

A vector operation exception is raised if \( vlr \) is larger than the implementation’s maximum vector length.

**Operation:**
```
for (i=0; i<vlr; i++)
    v[vw][i] = v[vd][i] xor v[vt][i];
```

**Exceptions:**
Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

---

**XOR.VS** | **Xor Vector-Scalar**
---|---
| COP2 | IVS | rt | XOR V | VS | vw | vd |
| 010010 | 10101 | 5 | 5 | 1 | 5 | 5 |

**Format:**
xor.vs vw, vd, rt

**Description:**
The vector register \( vlr \) is read to give \( n \) the number of elements to compute. The first \( n \) elements of vector register \( vd \) are combined with scalar register \( rt \) in a bitwise logical XOR operation. The results are placed in the first \( n \) elements of vector register \( vw \).

A vector operation exception is raised if \( vlr \) is larger than the implementation’s maximum vector length.

**Operation:**
```
for (i=0; i<vlr; i++)
    v[vw][i] = v[vd][i] xor r[rt];
```

**Exceptions:**
Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
**NOR.VV**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>1V1</td>
<td>VV</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>V</td>
<td>NOR</td>
<td>VS</td>
<td>VW</td>
<td>VT</td>
<td></td>
</tr>
</tbody>
</table>

**Nor Vector-Vector**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>1V1</td>
<td>VV</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>V</td>
<td>NOR</td>
<td>VS</td>
<td>VW</td>
<td>VT</td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

nor.vv vw, vd, vt

**Description:**

The vector register vlr is read to give n the number of elements to compute. The first n elements of vector register vd are combined with first n elements of vector register vt in a bitwise logical NOR operation. The results are placed in the first n elements of vector register vw.

A vector operation exception is raised if vlr is larger than the implementation’s maximum vector length.

**Note:** the operation “nor.vv vw, vd, $vr0” performs a vector bitwise NOT operation.

**Operation:**

```c
for (i=0; i<vlr; i++)
  v[vw][i] = v[vd][i] nor v[vt][i];
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

---

**NOR.VS**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>1V1</td>
<td>VS</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>V</td>
<td>NOR</td>
<td>VS</td>
<td>VW</td>
<td>VT</td>
<td></td>
</tr>
</tbody>
</table>

**Nor Vector-Scalar**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>1V1</td>
<td>VS</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>V</td>
<td>NOR</td>
<td>VS</td>
<td>VW</td>
<td>VT</td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

nor.vs vw, vd, rt

**Description:**

The vector register vlr is read to give n the number of elements to compute. The first n elements of vector register vt are combined with scalar register rt in a bitwise logical NOR operation. The results are placed in the first n elements of vector register vw.

A vector operation exception is raised if vlr is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
  v[vw][i] = v[vd][i] nor r[rt];
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
ADD.VV | Add Vector-Vector
\[\begin{array}{cccccccc}
31 & 26 & 25 & 21 & 20 & 16 & 15 & 11 & 10 & 9 & 5 & 4 & 0 \\
\hline
COP2 & IV V & vt & ADD & VS & vv & vd & \\
\hline
010010 & 10100 & 5 & 5 & 5 & 5 & 1 & 5 & 5
\end{array}\]

Format:
add.vv vv, vd, vt

Description:
The vector register \(v\) is read to give \(n\) the number of elements to be added. The first \(n\) elements of vector register \(v\) are added to the first \(n\) elements of vector register \(v\) and the results are placed in the first \(n\) elements of vector register \(v\).

The input elements are treated as signed integers. The appropriate bit in \(v\) is set for any result that overflows.

A vector operation exception is raised if \(v\) is larger than the implementation's maximum vector length.

Operation:
for (i=0; i<v; i++)
{
   v[vw][i] = v[vd][i] + v[vt][i];
   if (overflow_on_add(v[vd][i], v[vt][i]))
      vcr[VOVF] |= (i<<i);
}

Exceptions:
Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

ADD.VS | Add Vector-Scalar
\[\begin{array}{cccccccc}
31 & 26 & 25 & 21 & 20 & 16 & 15 & 11 & 10 & 9 & 5 & 4 & 0 \\
\hline
COP2 & IV S & rt & ADD & VS & vv & vd & \\
\hline
010010 & 10101 & 5 & 5 & 5 & 5 & 1 & 5 & 5
\end{array}\]

Format:
add.vs vv, vd, rt

Description:
The vector register \(v\) is read to give \(n\) the number of elements to be added. The first \(n\) elements of vector register \(v\) are added to the scalar register \(r\) and the results are placed in the first \(n\) elements of vector register \(v\).

The input elements are treated as signed integers. The appropriate bit of \(v\) is set for any result that overflows.

A vector operation exception is raised if \(v\) is larger than the implementation's maximum vector length.

Operation:
for (i=0; i<v; i++)
{
   v[vw][i] = v[vd][i] + r[rt];
   if (overflow_on_add(v[vd][i], r[rt]))
      vcr[VOVF] |= (i<<i);
}

Exceptions:
Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
ADDU.VV  Add Unsigned Vector-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVV</th>
<th>vt</th>
<th>ADDU</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>10010</td>
<td>10100</td>
<td>26</td>
<td>25</td>
<td>21</td>
<td>16</td>
<td>15</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

Format:

addu.vv vw, vd, vt

Description:

The vector register \( v \) is read to give \( n \) the number of elements to be added. The first \( n \) elements of vector register \( v \) are added to the first \( n \) elements of vector register \( v \) and the results are placed in the first \( n \) elements of vector register \( v \).

The input elements are treated as unsigned integers. Overflows are ignored.

A vector operation exception is raised if \( v \) is larger than the implementation’s maximum vector length.

Operation:

```c
for (i=0; i<v; i++)
{
    v[v+w][i] = v[v][i] + v[r][i];
}
```

Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

ADDU.VS  Add Unsigned Vector-Scalar

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>ADDU</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>10010</td>
<td>10101</td>
<td>26</td>
<td>25</td>
<td>21</td>
<td>16</td>
<td>15</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

Format:

addu.vs vw, vd, rt

Description:

The vector register \( v \) is read to give \( n \) the number of elements to be added. The first \( n \) elements of vector register \( v \) are added to the scalar register \( r \) and the results are placed in the first \( n \) elements of vector register \( v \).

The input elements are treated as unsigned integers. Overflows are ignored.

A vector operation exception is raised if \( v \) is larger than the implementation’s maximum vector length.

Operation:

```c
for (i=0; i<v; i++)
{
    v[v][i] = v[v][i] + r[r][i];
}
```

Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
**SUB.VV**  Subtract Vector-Vector

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>IV V</td>
<td>vt</td>
<td>SUB</td>
<td>VS</td>
<td>vw</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010010</td>
<td>10100</td>
<td>00000</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

`sub.vv vw, vd, vt`

**Description:**

The vector register `vlr` is read to give `n` the number of elements to be subtracted. The first `n` elements of vector register `vt` are subtracted from the first `n` elements of vector register `vd` and the results are placed in the first `n` elements of vector register `vw`.

The input elements are treated as signed integers. The appropriate bit of `vovf` is set for any result that overflows.

A vector operation exception is raised if `vlr` is larger than the implementation's maximum vector length.

Note: the operation "`sub.vv vw, $vr0, vt`" performs a vector negate operation.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    v[vw][i] = v[vd][i] - v[rt][i];
    if (overflow_on_sub(v[vd][i],v[rt][i]))
        vcr[VUVPF] |= (i<i);
}
```

**Exceptions:**

Reserved instruction exception.
Co-processor unusable exception.
Vector operation exception.

---

**SUB.VS**  Subtract Vector-Scalar

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>IV S</td>
<td>rt</td>
<td>SUB</td>
<td>VS</td>
<td>vw</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010010</td>
<td>10101</td>
<td>00000</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

`sub.vs vw, vd, rt`

**Description:**

The vector register `vlr` is read to give `n` the number of elements to be subtracted. The scalar register `rt` is subtracted from the first `n` elements of vector register `vd` and the results are placed in the first `n` elements of vector register `vw`.

The input elements are treated as signed integers. The appropriate bit of `vovf` is set for any result that overflows.

A vector operation exception is raised if `vlr` is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    v[vw][i] = v[vd][i] - r[rt];
    if (overflow_on_sub(v[vd][i],r[rt]))
        vcr[VUVPF] |= (i<i);
}
```

**Exceptions:**

Reserved instruction exception.
Co-processor unusable exception.
Vector operation exception.
### SUB.SV  Subtract Scalar-Vector

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>IVS</td>
<td>rt</td>
<td>SUB</td>
<td>SV</td>
<td>vw</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010010</td>
<td>10101</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

```
sub.sv vw, rt, vd
```

**Description:**

The vector register \( \text{vlr} \) is read to give \( n \) the number of elements to be subtracted. The first \( n \) elements of vector register \( vd \) are subtracted from the scalar register \( rt \) and the results are placed in the first \( n \) elements of vector register \( vw \).

The input elements are treated as signed integers. The appropriate bit of \( \text{vo}v\text{vf} \) is set for any result that overflows.

A vector operation exception is raised if \( \text{vlr} \) is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    v[vw][i] = r[rt] - v[vd][i];
    if (overflow_on_sub(r[rt],v[vd][i]))
        vcr[VQVF] |=(i<<i);
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

### SUBU.VV  Subtract Unsigned Vector-Vector

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>IVV</td>
<td>vt</td>
<td>SUB</td>
<td>VS</td>
<td>vw</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010010</td>
<td>10100</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

```
subu.vv vw, vd, vt
```

**Description:**

The vector register \( \text{vlr} \) is read to give \( n \) the number of elements to be subtracted. The first \( n \) elements of vector register \( vt \) are subtracted from the first \( n \) elements of vector register \( vd \) and the results are placed in the first \( n \) elements of vector register \( vw \).

The input elements are treated as unsigned integers. Overflows are ignored.

A vector operation exception is raised if \( \text{vlr} \) is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    v[vw][i] = v[vd][i] - v[vt][i];
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
**SUBU.VS**

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>SUBU</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>5</td>
<td>00001</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
</tbody>
</table>

**SUBU.SV**

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>SUBU</th>
<th>SV</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>5</td>
<td>00001</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**

subu.vs $v_w$, $v_d$, $r_t$

**Description:**

The vector register $v_l$ is read to give $n$ the number of elements to be subtracted. The scalar register $r_t$ is subtracted from the first $n$ elements of vector register $v_d$ and the results are placed in the first $n$ elements of vector register $v_w$.

The input elements are treated as unsigned integers. Overflows are ignored.

A vector operation exception is raised if $v_l$ is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<v_l; i++)
{
    v[vw][i] = v[vd][i] - r[rt];
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

**SUBU.SV**

**Format:**

subu.sv $v_w$, $r_t$, $v_d$

**Description:**

The vector register $v_l$ is read to give $n$ the number of elements to be subtracted. The first $n$ elements of vector register $v_d$ are subtracted from the scalar register $r_t$ and the results are placed in the first $n$ elements of vector register $v_w$.

The input elements are treated as unsigned integers. Overflows are ignored.

A vector operation exception is raised if $v_l$ is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<v_l; i++)
{
    v[vw][i] = r[r_t] - v[vd][i];
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
### SLLV.VV

**Shift Left Logical Variable Vector-Vector**

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVV</th>
<th>vt</th>
<th>SLLV</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10100</td>
<td></td>
<td>010000</td>
<td>0</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

| 6 | 5 | 5 | 5 | 1 | 5 | 5 |

**Format:**

sllv.vv vv, vd, vt

**Description:**

The vector register $vlr$ is read to give $n$ the number of elements to shift. The first $n$ elements of vector register $vd$ are shifted left by the number of bits given in the least significant 5 bits of the first $n$ elements of vector register $vt$. Zeros are inserted into the low order bits. The results are placed in the first $n$ elements of vector register $vw$.

A vector operation exception is raised if $vlr$ is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
    v[vw][i] = v[vd][i] << (v[vt][i] & 0x1f);
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

### SLLV.VS

**Shift Left Logical Variable Vector-Scalar**

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>SLLV</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td></td>
<td>010000</td>
<td>0</td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

| 6 | 5 | 5 | 5 | 1 | 5 | 5 |

**Format:**

sllv.vs vv, vd, rt

**Description:**

The vector register $vlr$ is read to give $n$ the number of elements to shift. The first $n$ elements of vector register $vd$ are shifted left by the number of bits given in the least significant 5 bits of scalar register $rt$. Zeros are inserted into the low order bits. The results are placed in the first $n$ elements of vector register $vw$.

A vector operation exception is raised if $vlr$ is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
    v[vw][i] = v[vd][i] << (r[rt] & 0x1f);
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
SLLV.SV Shift Left Logical Variable Scalar-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>SLLV</th>
<th>SV</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

Format:

sllv.sv vw, rt, vd

Description:

The vector register vl is read to give \( n \) the number of elements to shift. The scalar register \( rt \) is shifted left by the number of bits given in the least significant 5 bits of the first \( n \) elements of vector register \( vd \). Zeros are inserted into the low order bits. The results are placed in the first \( n \) elements of vector register \( vw \).

A vector operation exception is raised if \( vl \) is larger than the implementation's maximum vector length.

Operation:

\[
\text{for (i=0; i<vl; i++)} \\
\quad v[vw][i] = r[rt] \ll (v[vd][i] \& 0x1f);
\]

Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

SRAV.VV Shift Right Arithmetic Variable Vector-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVV</th>
<th>vt</th>
<th>SRAV</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10100</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

Format:

sra.vv vw, vd, vt

Description:

The vector register vl is read to give \( n \) the number of elements to shift. The first \( n \) elements of vector register \( vd \) are shifted right by the number of bits given in the least significant 5 bits of the first \( n \) elements of vector register \( vt \). The high order bits are sign-extended. The results are placed in the first \( n \) elements of vector register \( vw \).

A vector operation exception is raised if \( vl \) is larger than the implementation's maximum vector length.

Operation:

\[
\text{for (i=0; i<vl; i++)} \\
\quad v[vw][i] = v[vd][i] \text{ shra} (v[vt][i] \& 0x1f);
\]

Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
SRAV.VS Shift Right Arithmetic Variable Vector-Scalar

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>SRAV</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**
srav.vs vw, vd, rt

**Description:**
The vector register `vlr` is read to give `n` the number of elements to shift. The first `n` elements of vector register `vd` are shifted right by the number of bits given in the least significant 5 bits of scalar register `rt`. The high order bits are sign-extended. The results are placed in the first `n` elements of vector register `vw`.

A vector operation exception is raised if `vlr` is larger than the implementation’s maximum vector length.

**Operation:**
```c
for (i=0; i<vlr; i++)
    v[vw][i] = v[vd][i] shr (rt[5] & 0x1f);
```

**Exceptions:**
Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

SRAV.SV Shift Right Arithmetic Variable Scalar-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>SRAV</th>
<th>SV</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**
srav.sv vw, rt, vd

**Description:**
The vector register `vlr` is read to give `n` the number of elements to shift. The scalar register `rt` is shifted right by the number of bits given in the least significant 5 bits of the first `n` elements of vector register `vd`. The high order bits are sign-extended. The results are placed in the first `n` elements of vector register `vw`.

A vector operation exception is raised if `vlr` is larger than the implementation’s maximum vector length.

**Operation:**
```c
for (i=0; i<vlr; i++)
    v[vw][i] = r[rt] shr (v[vd][i] & 0x1f);
```

**Exceptions:**
Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
SRLV.VV  Shift Right Logical Variable Vector-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVVV</th>
<th>vt</th>
<th>SRLV</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10100</td>
<td>21</td>
<td>20</td>
<td>15</td>
<td>11</td>
<td>10</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**
srlv.vv vw, vd, vt

**Description:**
The vector register vlr is read to give n the number of elements to shift. The first n elements of vector register vd are shifted right by the number of bits given in the least significant 5 bits of the first n elements of vector register vt. Zeros are inserted into the high order bits. The results are placed in the first n elements of vector register vw.

A vector operation exception is raised if vlr is larger than the implementation’s maximum vector length.

**Operation:**

for (i=0; i<vlr; i++)
   v[vw][i] = v[vd][i] >> (v[vt][i] & 0x1f);

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

SRLV.VS  Shift Right Logical Variable Vector-Scalar

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>SRLV</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>11</td>
<td>10</td>
<td>9</td>
<td>5</td>
<td>4</td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**
srlv.vs vw, vd, rt

**Description:**
The vector register vlr is read to give n the number of elements to shift. The first n elements of vector register vd are shifted right by the number of bits given in the least significant 5 bits of scalar register rt. Zeros are inserted into the high order bits. The results are placed in the first n elements of vector register vw.

A vector operation exception is raised if vlr is larger than the implementation’s maximum vector length.

**Operation:**

for (i=0; i<vlr; i++)
   v[vw][i] = v[vd][i] >> (r[rt] & 0x1f);

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
**SRLV.SV**  
*Shift Right Logical Variable Scalar-Vector*

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>SRLV</th>
<th>SV</th>
<th>vv</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>01001</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**
srlv.sv vv, rt, vd

**Description:**
The vector register vl1r is read to give n the number of elements to shift. The scalar register rt is shifted right by the number of bits given in the least significant 5 bits of the first n elements of vector register vd. Zeros are inserted into the high order bits. The results are placed in the first n elements of vector register vv.

A vector operation exception is raised if vl1r is larger than the implementation’s maximum vector length.

**Operation:**
```plaintext```
for (i=0; i<vl1r; i++)
    v[vv][i] = r[rt] >> (v[vd][i] & 0x1f);
```

**Exceptions:**
Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

---

**SLT.VV**  
*Set Less Than Vector-Vector*

<table>
<thead>
<tr>
<th>COP2</th>
<th>IV V</th>
<th>vt</th>
<th>SLT</th>
<th>VS</th>
<th>vv</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10100</td>
<td>01100</td>
<td>0</td>
<td>5</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**
slt.vv vv, vd, vt

**Description:**
The vector register vl1r is read to give n the number of elements to be compared. The first n elements of vector register vd are compared with the first n elements of vector register vt. If an element of vd is less than an element of vt, the corresponding element in vv is set to 1, else it is set to 0. The elements are considered as signed integers.

A vector operation exception is raised if vl1r is larger than the implementation’s maximum vector length.

**Operation:**
```plaintext```
for (i=0; i<vl1r; i++)
{
    if (v[vd][i] < v[vt][i])
        v[vv][i] = 1;
    else
        v[vv][i] = 0;
}
```

**Exceptions:**
Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
### SLT.VS  Set Less Than Vector-Scalar

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>SLT</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>01100</td>
<td>0</td>
<td>5</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**

slt.vs vw, vd, rt

**Description:**

The vector register `vlr` is read to give \( n \) the number of elements to be compared. The first \( n \) elements of vector register `vd` are compared with the scalar register `rt`. If an element of `vd` is less than `rt`, the corresponding element in `vw` is set to 1, else it is set to 0. The elements are considered as signed integers.

A vector operation exception is raised if `vlr` is larger than the implementation's maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    if (v[vd][i] < rt)
        v[vw][i] = 1;
    else
        v[vw][i] = 0;
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

### SLT.SV  Set Less Than Scalar-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>SLT</th>
<th>SV</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>01100</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**

slt.sv vw, rt, vd

**Description:**

The vector register `vlr` is read to give \( n \) the number of elements to be compared. The first \( n \) elements of vector register `vd` are compared with the scalar register `rt`. If `rt` is less than an element of `vd`, the corresponding element in `vw` is set to 1, else it is set to 0. The elements are considered as signed 32-bit integers.

A vector operation exception is raised if `vlr` is larger than the implementation's maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    if (r[rt] < v[vd][i])
        v[vw][i] = 1;
    else
        v[vw][i] = 0;
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
### SLTU.VV  
Set Less Than Unsigned Vector-Vector

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>IVV</td>
<td>vt</td>
<td>SLTU</td>
<td>VS</td>
<td>vw</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>---</td>
<td>----</td>
<td>---</td>
<td>-----</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010010</td>
<td>10100</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Format:

`slltu vv vw, vd, vt`

#### Description:

The vector register `vlr` is read to give `n` the number of elements to be compared. The first `n` elements of vector register `vd` are compared with the first `n` elements of vector register `vt`. If an element of `vd` is less than an element of `vt`, the corresponding element in `vw` is set to 1, else it is set to 0. The elements are considered as unsigned integers.

A vector operation exception is raised if `vlr` is larger than the implementation’s maximum vector length.

#### Operation:

```c
for (i=0; i<vlr; i++)
{
    if (v[vd][i] < v[vt][i])
        v[vw][i] = 1;
    else
        v[vw][i] = 0;
}
```

#### Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

### SLTU.VS  
Set Less Than Unsigned Vector-Scalar

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>IVS</td>
<td>rt</td>
<td>SLTU</td>
<td>VS</td>
<td>vw</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>---</td>
<td>----</td>
<td>---</td>
<td>-----</td>
<td>---</td>
<td>---</td>
<td>---</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010010</td>
<td>10101</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

#### Format:

`slltu vs vw, vd, rt`

#### Description:

The vector register `vlr` is read to give `n` the number of elements to be compared. The first `n` elements of vector register `vd` are compared with the scalar register `rt`. If an element of `vd` is less than `rt`, the corresponding element in `vw` is set to 1, else it is set to 0. The elements are considered as unsigned integers.

A vector operation exception is raised if `vlr` is larger than the implementation’s maximum vector length.

#### Operation:

```c
for (i=0; i<vlr; i++)
{
    if (v[vd][i] < rt)
        v[vw][i] = 1;
    else
        v[vw][i] = 0;
}
```

#### Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
**SLTU.SV**  
Set Less Than Unsigned Scalar-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>SLTU</th>
<th>SV</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>rt</td>
<td>SLTU</td>
<td>01101</td>
<td>1</td>
<td>vw</td>
</tr>
</tbody>
</table>

**SEQ.VV**  
Set Equal Vector-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IV V</th>
<th>vt</th>
<th>SEQ</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10100</td>
<td>vt</td>
<td>01110</td>
<td>0</td>
<td>vw</td>
<td>vd</td>
</tr>
</tbody>
</table>

**Format:**

sltu.sv vw, rt, vd

**Description:**

The vector register vl is read to give n the number of elements to be compared. The first n elements of vector register vd are compared with the scalar register rt. If rt is less than an element of vd, the corresponding element in vw is set to 1, else it is set to 0. The elements are considered as unsigned integers.

A vector operation exception is raised if vl is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vl; i++)
{
    if (rt[i] < vd[i])
        vw[i] = 1;
    else
        vw[i] = 0;
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

**SEQ.VV**  
Set Equal Vector-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IV V</th>
<th>vt</th>
<th>SEQ</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10100</td>
<td>vt</td>
<td>01110</td>
<td>0</td>
<td>vw</td>
<td>vd</td>
</tr>
</tbody>
</table>

**Format:**

seq.vv vw, vd, vt

**Description:**

The vector register vl is read to give n the number of elements to be compared. The first n elements of vector register vd are compared with the first n elements of vector register vt. If an element of vd is equal to an element of vt, the corresponding element in vw is set to 1, else it is set to 0.

A vector operation exception is raised if vl is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vl; i++)
{
    if (vt[i] == vd[i])
        vw[i] = 1;
    else
        vw[i] = 0;
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
SEQ.VS    Set Equal Vector-Scalar

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>SEQ</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>0001</td>
<td>0010</td>
<td>0110</td>
<td>0011</td>
<td>01</td>
<td>10</td>
<td>05</td>
</tr>
</tbody>
</table>

Format:

seq.vw vd rt

Description:

The vector register vlri is read to give n the number of elements to be compared. The first n elements of vector register vd are compared with the scalar register rt. If an element of vd is equal to rt, the corresponding element in vw is set to 1, else it is set to 0.

A vector operation exception is raised if vlri is larger than the implementation’s maximum vector length.

Operation:

```c
for (i=0; i<vlri; i++)
{
    if (v[vd][i] == rt)
        v[vw][i] = 1;
    else
        v[vw][i] = 0;
}
```

Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
**FLT.VV  Flag Less Than Vector-Vector**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>IVV</td>
<td>vt</td>
<td>FLT</td>
<td>VS</td>
<td>0</td>
<td>00000</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010010</td>
<td>10100</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

`flt.vv vd, vt`

**Description:**

The vector register `vlr` is read to give `n` the number of elements to be compared. The first `n` elements of vector register `vd` are compared with the first `n` elements of vector register `vt`. If an element of `vd` is less than an element of `vt`, the corresponding bit in the vector unit condition code is set to 1, else it is set to 0. The elements are considered as signed integers.

A vector operation exception is raised if `vlr` is larger than the implementation's maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    if (v[vd][i] < v[vt][i])
        vcr[VCOND] |= (i<i);
    else
        vcr[VCOND] &^= ¬(i<i);
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

---

**FLT.VS  Flag Less Than Vector-Scalar**

<table>
<thead>
<tr>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>IVS</td>
<td>rt</td>
<td>FLT</td>
<td>VS</td>
<td>0</td>
<td>00000</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>010010</td>
<td>10101</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

`flt.vs vd, rt`

**Description:**

The vector register `vlr` is read to give `n` the number of elements to be compared. The first `n` elements of vector register `vd` are compared with the scalar register `rt`. If an element of `vd` is less than `rt`, the corresponding bit in the vector unit condition code is set to 1, else it is set to 0. The elements are considered as signed integers.

A vector operation exception is raised if `vlr` is larger than the implementation's maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    if (v[vd][i] < rt)
        vcr[VCOND] |= (i<i);
    else
        vcr[VCOND] &^= ¬(i<i);
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
### FLT.SV  Flag Less Than Scalar-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>rt</th>
<th>FLT</th>
<th>SV</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>00100</td>
<td>1</td>
<td>00000</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**

`flt.sv rt, vd`

**Description:**

The vector register `vlr` is read to give `n` the number of elements to be compared. The first `n` elements of vector register `vd` are compared with the scalar register `rt`. If `rt` is less than an element of `vd`, the corresponding bit in the vector unit condition code is set to 1, else it is set to 0. The elements are considered as signed 32-bit integers.

A vector operation exception is raised if `vlr` is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    if (r[rt] < v[vd][i])
        vcr[VCOND] |= (1<<i);
    else
        vcr[VCOND] &= ~(1<<i);
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

### FLT.UVV  Flag Less Than Unsigned Vector-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVV</th>
<th>vt</th>
<th>FLTU</th>
<th>VS</th>
<th>vv</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10100</td>
<td>00101</td>
<td>0</td>
<td>00000</td>
<td>5</td>
</tr>
</tbody>
</table>

**Format:**

`flt.uv v d, vt`

**Description:**

The vector register `vlr` is read to give `n` the number of elements to be compared. The first `n` elements of vector register `vd` are compared with the first `n` elements of vector register `vt`. If an element of `vd` is less than an element of `vt`, the corresponding bit in the vector unit condition code is set to 1, else it is set to 0. The elements are considered as unsigned integers.

A vector operation exception is raised if `vlr` is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    if (v[vd][i] < v[vt][i])
        vcr[VCOND] |= (1<<i);
    else
        vcr[VCOND] &= ~(1<<i);
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
FLTU.VS  Flag Less Than Unsigned Vector-Scalar

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>r</th>
<th>FLTU</th>
<th>VS</th>
<th>0</th>
<th>00000</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

Format:

fltu.vs vd, rt

Description:

The vector register vlr is read to give n the number of elements to be compared. The first n elements of vector register vd are compared with the scalar register rt. If an element of vd is less than rt, the corresponding bit in the vector unit condition code is set to 1, else it is set to 0. The elements are considered as unsigned integers.

A vector operation exception is raised if vlr is larger than the implementation’s maximum vector length.

Operation:

for (i=0; i<vlr; i++)
{
    if (v[vd][i] < rt)
        vcr[VCOND] |= (i<i);
    else
        vcr[VCOND] &= ~(i<i);
}

Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

---

FLTU.SV  Flag Less Than Unsigned Scalar-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVS</th>
<th>r</th>
<th>FLTU</th>
<th>SV</th>
<th>0</th>
<th>00000</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10101</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

Format:

fltu.sv rt, vd

Description:

The vector register vlr is read to give n the number of elements to be compared. The first n elements of vector register vd are compared with the scalar register rt. If rt is less than an element of vd, the corresponding bit in the vector unit condition code is set to 1, else it is set to 0. The elements are considered as unsigned integers.

A vector operation exception is raised if vlr is larger than the implementation’s maximum vector length.

Operation:

for (i=0; i<vlr; i++)
{
    if (r[rt] < v[vd][i])
        vcr[VCOND] |= (i<i);
    else
        vcr[VCOND] &= ~(i<i);
}

Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
**FEQ.VV**  
*Flag Equal Vector-Vector*

<table>
<thead>
<tr>
<th></th>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>010010</td>
<td>IVV</td>
<td>10100</td>
<td>vt</td>
<td>FEQ</td>
<td>00110</td>
<td>VS</td>
<td>00000</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**FEQ.VS**  
*Set Equal Vector-Scalar*

<table>
<thead>
<tr>
<th></th>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>010010</td>
<td>IVS</td>
<td>10101</td>
<td>rt</td>
<td>FEQ</td>
<td>00110</td>
<td>VS</td>
<td>00000</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

`feq.vv vd, vt`

**Description:**

The vector register `vlr` is read to give `n` the number of elements to be compared. The first `n` elements of vector register `vd` are compared with the first `n` elements of vector register `vt`. If an element of `vd` is equal to an element of `vt`, the corresponding bit in the vector unit condition code is set to 1, else it is set to 0.

A vector operation exception is raised if `vlr` is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    if (v[vd][i] == v[vt][i])
        vcr[VCOND] |= (1<i);
    else
        vcr[VCOND] &= ~(1<i);
}
```

**Exceptions:**

Reserved instruction exception.  
Coprocessor unusable exception.  
Vector operation exception.

**FEQ.VS**  
*Set Equal Vector-Scalar*

<table>
<thead>
<tr>
<th></th>
<th>31</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>COP2</td>
<td>010010</td>
<td>IVS</td>
<td>10101</td>
<td>rt</td>
<td>FEQ</td>
<td>00110</td>
<td>VS</td>
<td>00000</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

`feq.vs vd, rt`

**Description:**

The vector register `vlr` is read to give `n` the number of elements to be compared. The first `n` elements of vector register `vd` are compared with the scalar register `rt`. If an element of `vd` is equal to `rt`, the corresponding bit in the vector unit condition code is set to 1, else it is set to 0.

A vector operation exception is raised if `vlr` is larger than the implementation’s maximum vector length.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    if (v[vd][i] == rt)
        vcr[VCOND] |= (1<i);
    else
        vcr[VCOND] &= ~(1<i);
}
```

**Exceptions:**

Reserved instruction exception.  
Coprocessor unusable exception.  
Vector operation exception.
CMVccc.VV  Conditional Move Vector-Vector

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVV</th>
<th>vt</th>
<th>CMVNEZ</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10100</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVV</th>
<th>vt</th>
<th>CMVLEZ</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10100</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVV</th>
<th>vt</th>
<th>CMVLEZ</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10100</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVV</th>
<th>vt</th>
<th>CMVLTZ</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10100</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>IVV</th>
<th>vt</th>
<th>CMVGTZ</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>10100</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
</tr>
</tbody>
</table>

corresponding element of vector vw is updated with the corresponding element of vt. All other elements of vw are unchanged.

A vector operation exception is raised if vlr is larger than the implementation’s maximum vector length.

Note: the operation “cmvleq.z.vv vw, $vr0, vt” performs an unconditional vector move.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    if (v[vr][i] condop 0)
    {
        v[vw][i] = v[vt][i];
    }
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

**Format:**

cmvneq.vv vw, vd, vt
cmvgeq.vv vw, vd, vt
cmvlez.vv vw, vd, vt
cmvgeq.vv vw, vd, vt
cmvltz.vv vw, vd, vt
cmvltz.vv vw, vd, vt

**Description:**
The vector register vlr is read to give n the number of elements to be moved. The first n elements of vector register vd are read. For all those elements that satisfy the comparison with zero, the
CMVcc vs Conditional Move Vector-Scalar

<table>
<thead>
<tr>
<th></th>
<th></th>
<th>COP2</th>
<th>IVS</th>
<th>V</th>
<th>CMVNEZ</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td></td>
<td>010010</td>
<td>10101</td>
<td>10</td>
<td>01001</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td></td>
<td>010010</td>
<td>10101</td>
<td>10</td>
<td>01001</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td></td>
<td>010010</td>
<td>10101</td>
<td>10</td>
<td>01001</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td></td>
<td>010010</td>
<td>10101</td>
<td>10</td>
<td>01001</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td></td>
<td>010010</td>
<td>10101</td>
<td>10</td>
<td>01001</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td></td>
<td>010010</td>
<td>10101</td>
<td>10</td>
<td>01001</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td></td>
<td>010010</td>
<td>10101</td>
<td>10</td>
<td>01001</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td></td>
<td></td>
<td>010010</td>
<td>10101</td>
<td>10</td>
<td>01001</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
</tbody>
</table>

The corresponding element of vector \(vw\) is updated with the value of scalar register \(rt\). All other elements of \(vw\) are unchanged.

A vector operation exception is raised if \(vlr\) is larger than the implementation’s maximum vector length.

Note: the operation “cmveqz vs vw, $vr0, vt” performs an unconditional scalar to vector move.

**Operation:**

```c
for (i=0; i<vlr; i++)
{
    if (v[vd][i] condop 0)
    {
        v[vw][i] = x[rt];
    }
}
```

**Exceptions:**

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.

**Format:**

cmvnez vs vw, vd, rt
cmvgez vs vw, vd, rt
cmvleq vs vw, vd, rt
cmvgeq vs vw, vd, rt
cmvltz vs vw, vd, rt
cmvgtz vs vw, vd, rt

**Description:**

The vector register \(vlr\) is read to give \(n\) the number of elements to be moved. The first \(n\) elements of vector register \(vd\) are read. For all those elements that satisfy the comparison with zero, the
5.7 Vector Fixed-Point Arithmetic Operations

The add, subtract and multiply fixed-point instructions are primarily used to implement scaled, rounded, and clipped fixed-point arithmetic. The scaling, rounding, and clipping information is supplied by a normal scalar register specified by the instruction. This register is termed the “configuration register”. The fixed point instructions can also perform some unsigned arithmetic with an appropriate value in the configuration register. The contents of the configuration register are interpreted as shown in Figure 9.

Figure 9: Fixed Point Configuration Register Format.

<table>
<thead>
<tr>
<th>Name</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>clipam</td>
<td>Clip amount.</td>
</tr>
<tr>
<td>shrst</td>
<td>Jam sticky bit or round to even.</td>
</tr>
<tr>
<td>shrnr</td>
<td>Don’t alter right shift output.</td>
</tr>
<tr>
<td>shlnr</td>
<td>Don’t add in round bit.</td>
</tr>
<tr>
<td>sham</td>
<td>Shift amount.</td>
</tr>
</tbody>
</table>

Table 7: Fixed point register fields.

The clip amount field is interpreted as shown in Table 8.

Table 8: Clip amount values.

<table>
<thead>
<tr>
<th>shrst/shrnrs/hlnr</th>
<th>Rounding mode</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>Round to even.</td>
</tr>
<tr>
<td>X11</td>
<td>Truncation.</td>
</tr>
<tr>
<td>X10</td>
<td>Round up.</td>
</tr>
<tr>
<td>101</td>
<td>Zero bias jamming.</td>
</tr>
</tbody>
</table>

Table 9: Fixed point rounding modes.

The encoding has been designed such that the most common operation of performing a scaled, round-to-even operation with a clip to 32b has zeros in all bits other than the shift field.

A saturation status register, vsat, is accessible in coprocessor 2. It is updated by fixed point arithmetic operations, with one “sticky” saturation bit provided for each vector element. Vector elements which have modified results due to clipping will set the appropriated bit in the saturation register. Bits in the register can only be cleared by explicit writes.
FXADD.yy

Fixed Point Add

<table>
<thead>
<tr>
<th>COP2</th>
<th>FXAXV</th>
<th>vt</th>
<th>rd</th>
<th>SV</th>
<th>vv</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>11000</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>FXAXVS</th>
<th>rt</th>
<th>rd</th>
<th>VS</th>
<th>vv</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>11001</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>FXAXVS</th>
<th>rt</th>
<th>rd</th>
<th>VS</th>
<th>vv</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>11001</td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
</tbody>
</table>

Format:

fxadd.yy vv, vd, vt, rd
fxadd.vs vv, vd, rt, rd
fxadd.sv vv, rt, vd, rd

Description:

This command is available in vector-vector, vector-scalar, and scalar-vector forms. The vector register vlr is read to give n the number of elements to be operated upon. The following operations are performed under the control of the scalar configuration register rd. The configuration register is formatted as shown in Figure 9.

First stage — source operand mux. For a vector-vector operation, the A input to the pipeline comes from the vd vector register, and the B input to the pipeline comes from the vt vector register. For a vector-scalar operation, the A input to the pipeline comes from the vd vector register and the B input to the pipeline comes from the rt scalar register. For a scalar-vector operation, the A input to the pipeline comes from the rt scalar register and the B input to the pipeline comes from the vd vector register.

Second stage — left shifter. The A input is shifted left by the amount given in the sham field of register rd. If the shlwr bit of rd is set, zeros are shifted in from the right. If the shlwr bit is clear, a 1 is shifted in from the 1/2 LSB position with zeros following. Only the low 32b of the result are kept, and no overflow checking is performed.

Third stage — adder. The B input is added to the shifted A input in a 33b adder. The extra bit on the adder ensures there can be no overflow at this stage.

Fourth stage — right shifter. The 33b adder result is shifted right by the number of bits given in the sham field of rd. Sign bits are shifted in to the high order bits. The bits which are shifted off to the right are OR-ed together to form a sticky bit. If the shrn bit in rd is clear, then the right shifted output is altered depending on the sticky bit. If both shrn and shrst are 0 and sham is not 0, the LSB of the right shifted output is cleared if the sticky bit is 0. If shrn is clear and shrst is set, the LSB of the right shifted output is OR-ed together with the sticky bit — effectively forming a new sticky bit over sham+1 bits. If shrn is set, then the right shifted result is not altered.

Fifth stage — clipper. The right shifted result is then clipped according to the value in the clipam field of rd. If the result is changed by clipping, the corresponding bit in the vsat register is set.

A vector operation exception is raised if vlir is larger than the implementation’s maximum vector length.

Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
FXSUB.yy

Fixed Point Subtract

<table>
<thead>
<tr>
<th>COP2</th>
<th>FXSV</th>
<th>vt</th>
<th>rd</th>
<th>SV</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>11010</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>FXSV</th>
<th>rt</th>
<th>rd</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>11011</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>FXSV</th>
<th>rt</th>
<th>rd</th>
<th>VS</th>
<th>vw</th>
<th>vd</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>11011</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
</tr>
</tbody>
</table>

Format:

fxsub.vv vw, vd, vt, rd
fxsub.vs vw, vd, rt, rd
fxsub.sv vw, rt, vd, rd

Description:

This command is available in vector-vector, vector-scalar, and scalar-vector forms. The vector register \( vlr \) is read to give \( n \) the number of elements to be operated upon. The following operations are performed under the control of the scalar configuration register \( rd \). The configuration register is formatted as shown in Figure 9.

First stage — source operand mux. For a vector-vector operation, the A input to the pipeline comes from the \( vd \) vector register, and the B input to the pipeline comes from the \( rt \) vector register. For a vector-scalar operation, the A input to the pipeline comes from the \( vd \) vector register and the B input to the pipeline comes from the \( rt \) scalar register. For a scalar-vector operation, the A input to the pipeline comes from the \( rt \) scalar register and the B input to the pipeline comes from the \( vd \) vector register.

Second stage — left shifter. The A input is shifted left by the amount given in the \( sham \) field of \( rd \). If the \( shlnr \) bit of \( rd \) is set, zeros are shifted in from the right. If the \( shlnr \) bit is clear, a 1 is shifted in from the 1/2 LSB position with zeros following. Only the low 32b of the result are kept, and no overflow checking is performed.

Third stage — subtractor. The B input is subtracted from the shifted A input in a 33b subtractor. The extra bit on the result ensures there can be no overflow at this stage.

Fourth stage — right shifter. The 33b subtractor result is shifted right by the number of bits given in the \( sham \) field of \( rd \). Sign bits are shifted in to the high order bits. The bits which are shifted off to the right are OR-ed together to form a sticky bit. If the \( shrn \) bit of \( rd \) is clear, then the right shifted output is altered depending on the sticky bit. If both \( shrn \) and \( shrst \) are 0 and \( sham \) is not 0, the LSB of the right shifted output is cleared if the sticky bit is 0. If \( shrn \) is clear and \( shrst \) is set, the LSB of the right shifted output is OR-ed together with the sticky bit — effectively forming a new sticky bit over \( sham+1 \) bits. If \( shrn \) is set, then the right shifted result is not altered.

Fifth stage — clipper. The right shifted result is then clipped according to the value in the \( clipam \) field of \( rd \). If the result is changed by clipping, the corresponding bit in the \( vsat \) register is set.

A vector operation exception is raised if \( vlr \) is larger than the implementation’s maximum vector length.

Exceptions:

Reserved instruction exception.
Coprocessor unusable exception.
Vector operation exception.
**FXMUL.yy**  
**Fixed Point Multiply**

<table>
<thead>
<tr>
<th>COP2</th>
<th>FXMV</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>11100</td>
<td>vt</td>
<td>rd</td>
<td>VS</td>
<td>vv</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>FXMS</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>11101</td>
<td>rt</td>
<td>rd</td>
<td>VS</td>
<td>vv</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>COP2</th>
<th>FXMS</th>
<th>26</th>
<th>25</th>
<th>21</th>
<th>20</th>
<th>16</th>
<th>15</th>
<th>11</th>
<th>10</th>
<th>9</th>
<th>5</th>
<th>4</th>
<th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>010010</td>
<td>11101</td>
<td>rt</td>
<td>rd</td>
<td>SV</td>
<td>vv</td>
<td>vd</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td>6</td>
<td>5</td>
<td>5</td>
<td>5</td>
<td>1</td>
<td>5</td>
<td>5</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>

**Format:**

fxmul.yy vv vv, vd, vt, rd  
fxmul.vs vv, vd, vt, rd  
fxmul.sv vv, rt, vd, rd

**Description:**

This command is available in vector-vector, vector-scalar and scalar-vector forms. For normal use of the instruction as described below, the vector-scalar and scalar-vector forms are identical. The vector register vlr is read to give n the number of elements to be operated upon. The following operations are performed under the control of the scalar configuration register rd. The configuration register is formatted as shown in Figure 9.

**First stage — source operand mux.** For a vector-vector operation, the A input to the pipeline comes from the vd vector register, and the B input to the pipeline comes from the rt vector register. For a vector-scalar operation, the A input to the pipeline comes from the vd vector register and the B input to the pipeline comes from the rt scalar register.

**Second stage — sign extension.** The low 16 bits of both the A and B operands are sign extended to 32 bits.

**Third stage — multiplier.** The sign extended A operand is multiplied by the sign extended B operand. The multiplier produces an exact 32b signed result.

**Fourth stage — round.** If the shlnr bit in rd is clear, a single 1 is added into the multiplier result at a position given by the sham field in rd. The sham field contains a bit index one greater than the position where the round bit is added. E.g., when sham is zero no bit is added, when sham is one, a 1 is added into the least significant bit of the multiplier result. The rounding bit is added in a 33b adder so no overflows can occur. If the shlnr bit is set, no rounding bit is added in.

**Fifth stage — right shifter.** The 33b multiplier result is shifted right by the number of bits given in the sham of rd. Sign bits are shifted in to the high order bits. The bits which are shifted off to the right are OR-ed together to form a sticky bit. If the shrnr bit of rd is clear, then the right shifted output is altered depending on the sticky bit to help implement different rounding schemes. If both shrnr and shrst are 0 and sham is not 0, the LSB of the right shifted output is cleared if the sticky bit is 0. If shrnr is clear and shrst is set, the LSB of the right shifted output is OR-ed together with the sticky bit — effectively forming a sticky bit over sham+1 bits. If shrnr is set, then the right shifted result is not altered.

**Sixth stage — clipper.** The right shifted result is then clipped according to the value in the clipam field of rd. If the result is changed by clipping, the corresponding bit in the vsat register is set.

A vector operation exception is raised if vlr is larger than the implementation’s maximum vector length.

**Exceptions:**

Reserved instruction exception.  
Coprocessor unusable exception.  
Vector operation exception.
6 Future Extensions

There are several areas where Torrent could be extended.

- The CPU may adopt the MIPS-III (64b) ISA extensions.
- The vector coprocessor may add vector floating-point, or further assist for floating-point operations.
- The vector coprocessor may add vector 64b integer and fixed-point operations.
- The vector coprocessor may add further vector move instructions to better support certain common operations, such as sorting, FFTs, and convolutions.
- The vector coprocessor may add segmented operations that provide higher throughput for low precision arithmetic.
A  T0 Fixed Point Pipe Operations

A.1  Overview

The Torrent fixed point add, fixed point subtract and fixed point multiply instructions (fxadd, fxsub and fxmul) use a general purpose scalar register as a "configuration register" to control their operation. The architecture manual section for these instructions describes the "fully supported" bits within these configuration words — i.e. bits that are guaranteed to function identically in all processors that implement the Torrent architecture. However, T0 (the first Torrent processor) also assigns functions to many other bits in the configuration register (although these functions are disabled when the corresponding bits are zero). This appendix describes the operation of the fixed point pipe and the effect of the bits in the configuration register on its operation.

Figure 10 details the useful bits within the configuration register — all unused bits should be set to 0.

A.2  Fixed Point Add/Subtract Pipeline

The T0 fixed point add/subtract pipeline contains 9 stages controlled by a scalar register specified in the rd field of the instruction. Figure 11 shows the logical structure of the fixed point add/subtract pipeline.

**First stage — Operand Mux.**

For a vector-vector operation, the A input to the pipeline comes from the vd vector register, and the B input to the pipeline comes from the vt vector register. For a vector-scalar operation, the A input to the pipeline comes from the vd vector register and the B input to the pipeline comes from the rt scalar register. For a scalar-vector operation, the A input to the pipeline comes from the rt scalar register and the B input to the pipeline comes from the vd vector register.

**Second stage — Logic Unit**

The logic unit can perform any of the 16 possible bitwise logical operations on A and B under control of the lufunc field, and produces a 32b result LUOUT. The default when lufunc is zero is to pass the B input unchanged. See Table 10 for bit encodings.

**Third stage — Left shifter**

If shlza is clear, the left shifter takes A as the input, otherwise it takes zero as input.

If shlzv is clear, the shift amount is taken from the shlam field, otherwise the shift amount is taken from the low 5 bits of LUOUT.

If shlvr is clear, a single 1 bit (followed by zeros) is shifted in from the right in the LSB − 1 position. This bit effectively adds in 1/2 LSB for the rounding modes. If shlvr is set, all zeros are shifted in from the right.

The left shifter output, SHLOUT, is 32b wide.

**Fourth stage — Sign Extenders**

The fourth stage extends SHLOUT and LUOUT to 33b to form the ADDA and ADDB adder inputs respectively.
Figure 10: T0 configuration register bits

<table>
<thead>
<tr>
<th>lufunc</th>
<th>LUOUT</th>
</tr>
</thead>
<tbody>
<tr>
<td>0000</td>
<td>B</td>
</tr>
<tr>
<td>0001</td>
<td>B</td>
</tr>
<tr>
<td>0010</td>
<td>A &amp; B</td>
</tr>
<tr>
<td>0011</td>
<td>¬(A &amp; ¬B)</td>
</tr>
<tr>
<td>0100</td>
<td>A</td>
</tr>
<tr>
<td>0101</td>
<td>¬O</td>
</tr>
<tr>
<td>0110</td>
<td>A</td>
</tr>
<tr>
<td>0111</td>
<td>A</td>
</tr>
<tr>
<td>1000</td>
<td>B &amp; ¬A</td>
</tr>
<tr>
<td>1001</td>
<td>¬A</td>
</tr>
<tr>
<td>1010</td>
<td>O</td>
</tr>
<tr>
<td>1011</td>
<td>¬(A</td>
</tr>
<tr>
<td>1100</td>
<td>A &amp; B</td>
</tr>
<tr>
<td>1101</td>
<td>¬(A &amp; B)</td>
</tr>
<tr>
<td>1110</td>
<td>A &amp; ¬B</td>
</tr>
<tr>
<td>1111</td>
<td>¬B</td>
</tr>
</tbody>
</table>

Table 10: lufunc operations.
Figure 11: T0 fixed point add/subtract pipeline
If $a_u$ is clear, $SHLOUT$ is sign-extended to form $ADDA$, else $SHLOUT$ is zero-extended to form $ADDA$.

If $b_u$ is clear, $LUOUT$ is sign-extended to form $ADDB$, else $LUOUT$ is zero-extended to form $ADDB$.

**Fifth stage — Adder**

The fifth stage is a 33b adder. If the operation is a fixed point add, $ADDA$ is added to $ADDB$ to give a full 33b result, $ADDOUT$. If the operation is a fixed point subtract, $ADDB$ is subtracted from $ADDA$ to give a 33b result, $ADDOUT$. If either $shlv$ or $shrv$ is set, then the $ADDB$ input is ignored and the adder passes $ADDA$ through unchanged.

**Sixth stage — Right Shifter**

The right shifter takes the 33b adder output, $ADDOUT$, and shifts it right by up to 31 places giving a 33b output $SHROUT$. It also includes sticky bit logic for round-to-nearest-even rounding.

If $shrv$ is clear then the right shift amount is a constant given in the configuration register. If $sepsham$ is clear, $shlam$ gives the constant shift amount, otherwise the separate $shram$ shift amount is used. The default is to have $shrv$ and $sepsham$ clear, so that both left and right shift amounts are specified by the $shlam$ field. If $shrv$ is set, then the low 5 bits of $ADDB$ (same as low 5 bits of $LUOUT$) are used to give the shift amount.

If $shrl$ is clear, the right shift is an arithmetic right shift with sign bits shifted in from the left, otherwise it is a logical right shift with zero bits shifted in form the left.

If $shrrn$ is set, no rounding is applied to the right shift output. If $shrrn$ is clear, $shrst$ controls the type of rounding adjustment. A sticky bit value is calculated by OR-ing together all the bits that are shifted off to the right. If $shrrn$ is clear and $shrst$ is true, this sticky bit value is OR-ed into the least significant bit of the output.

If both $shrrn$ and $shrst$ are clear, then the least significant bit of the shifter output is AND-ed with the sticky bit value. This last, default, case implements the adjustment required for round-to-even rounding if the left shifter added in a round bit in the 1/2 LSB position and both left and right shift amounts are the same. When the shift amount is zero, the sticky bit must be zero but no modification should be made to the right shifter output. The hardware includes a check for constant right shift amounts ($shrv = 0$) and turns off rounding in this case, however variable right shifts ($shrv = 1$) of zero places ($ADDB[4:0] = 0$) with $shrrn$ and $shrst$ clear will always reset the low bit of the right shifter output.

**Seventh stage — Result Mux**

If the $lures$ bit is clear, the 33b right shifter output $SHROUT$ is passed to the clipper input $CLIPIN$, otherwise the $ADDB$ value (sign-extended logic unit value) is passed to $CLIPIN$.

**Eighth stage — Clipper**

The clipper converts the 33b input $CLIPIN$ to a 32b result $CLIPOUT$. It also generates a single bit $FLAG$ which is OR-ed into the appropriate bit of the $vsat$ register.

If $noclip$ is clear, the clipper clips the 33b value to an 8b, 16b, or 32b value according to the $clipam$ field. $CLIPIN$ values larger than can be represented in the required number of bits are saturated at the most positive or most negative values possible. The $FLAG$ bit is set if a saturation occurs. This is the normal usage where $vsat$ indicates saturations. See Table 11 for details.

If $noclip$ is set, the clipper performs alternate functions – see Table 12 for details. Note that these generate $FLAG$ values which may alter $vsat$.

The “pass” function passes the low 32b of $CLIPIN$ unchanged and always generates a zero $FLAG$ so that $vsat$ is unchanged.

The “overflow” function passes the low 32b of $CLIPIN$ unchanged and generates $FLAG$ if there is a signed overflow when truncating $CLIPIN$.
from 33b to the 32b \textit{CLIPOUT}.

The "set if less than" function returns 1 if \textit{CLIPIN} is negative (MSB = 1) or 0 if \textit{CLIPIN} is positive (MSB = 0). \textit{FLAG} is set in the same manner.

The "set if equal" function returns 1 if \textit{ADDOUT} equals zero, 0 otherwise (note this function does not depend on \textit{SHROUT}). \textit{FLAG} is set in the same manner.

\textbf{Ninth stage — Conditional Write}

The last stage decides whether to write \textit{CLIPOUT} to the vector register dependent on the value of \textit{ADDOUT} and the setting in the \textit{wcond} field. See Table 13 for details.
<table>
<thead>
<tr>
<th><code>noclip</code></th>
<th><code>clipam</code></th>
<th>clip amount</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>00</td>
<td>32b</td>
</tr>
<tr>
<td>0</td>
<td>01</td>
<td>8b</td>
</tr>
<tr>
<td>0</td>
<td>10</td>
<td>16b</td>
</tr>
<tr>
<td>0</td>
<td>11</td>
<td>reserved (16b on T0)</td>
</tr>
</tbody>
</table>

Table 11: Clip amounts with `noclip` clear

<table>
<thead>
<tr>
<th><code>noclip</code></th>
<th><code>clipam</code></th>
<th>Mnemonic</th>
<th><code>CLIPOUT</code></th>
<th><code>FLAG</code></th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>00</td>
<td>pass</td>
<td><code>CLIPIN[31 : 0]</code></td>
<td>0</td>
</tr>
<tr>
<td>1</td>
<td>01</td>
<td>overflow</td>
<td><code>CLIPIN[31 : 0]</code></td>
<td><code>CLIPIN[32] ≠ CLIPIN[31]</code></td>
</tr>
<tr>
<td>1</td>
<td>10</td>
<td>set if less than</td>
<td><code>CLIPIN[32]</code></td>
<td><code>CLIPIN[32]</code></td>
</tr>
<tr>
<td>1</td>
<td>11</td>
<td>set if equal</td>
<td><code>ADDOUT = 0</code></td>
<td><code>ADDOUT = 0</code></td>
</tr>
</tbody>
</table>

Table 12: Set operations with `noclip` set

<table>
<thead>
<tr>
<th><code>wcond</code></th>
<th>mnemonic</th>
<th>write enable</th>
</tr>
</thead>
<tbody>
<tr>
<td>000</td>
<td>always</td>
<td>1</td>
</tr>
<tr>
<td>001</td>
<td>nez</td>
<td><code>ADDOUT ≠ 0</code></td>
</tr>
<tr>
<td>010</td>
<td>gez</td>
<td><code>ADDOUT ≥ 0</code></td>
</tr>
<tr>
<td>011</td>
<td>lez</td>
<td><code>ADDOUT ≤ 0</code></td>
</tr>
<tr>
<td>100</td>
<td>never</td>
<td>0</td>
</tr>
<tr>
<td>101</td>
<td>eqz</td>
<td><code>ADDOUT = 0</code></td>
</tr>
<tr>
<td>110</td>
<td>ltz</td>
<td><code>ADDOUT &lt; 0</code></td>
</tr>
<tr>
<td>111</td>
<td>gtz</td>
<td><code>ADDOUT &gt; 0</code></td>
</tr>
</tbody>
</table>

Table 13: Conditional write operations
A.3 Fixed Point Multiply Pipeline

The T0 fixed point multiply pipeline contains 8 stages controlled by a 32b CPU register specified in the rd field of the instruction. Figure 12 shows the logical structure of the fixed point multiply pipeline.

First stage — Operand Mux
For a vector-vector operation, the A input to the pipeline comes from the \textit{vd} vector register, and the B input to the pipeline comes from the \textit{rt} vector register. For a vector-scalar operation, the A input to the pipeline comes from the \textit{vd} vector register and the B input to the pipeline comes from the \textit{rt} scalar register.

Second stage — Sign Extenders
The second stage treats the least significant 16b of the A and B inputs as integers and then sign- or zero-extends them to form signed 17b inputs to the multiplier.


Third stage — Multiplier
The multiplier performs a signed 17b by 17b multiply of \textit{MULA} and \textit{MULB} giving an exact signed 33b result \textit{MULOUT} (note only 32b required to represent product).

Fourth stage — Round bit
The fourth stage uses the left shifter to generate a rounding bit, \textit{ROUND}, to be added into the product.

If \textit{shlnr} is clear, \textit{shlam} selects the bit position where the round bit will be placed. If \textit{shlam} contains 1, the round bit will be in bit 0. If \textit{shlam} contains 31, the round bit will be in bit 30.

If \textit{shlnr} is set or if \textit{shlam} is zero, no round bit is generated.

The \textit{shlza} and \textit{shlv} bits must be zero.

Fifth stage — Adder
The fifth stage is a 33b adder. The \textit{ROUND} bit is added to the multiplier output \textit{MULOUT} to give a 33b signed result \textit{ADDOUT}.

Sixth stage — Right Shifter
The right shifter takes the 33b adder output, \textit{ADDOUT} and shifts it right by up to 32 places giving a 33b output \textit{SHROUT}. It also includes sticky bit logic for round-to-even rounding.

The field \textit{shrv} must be clear in the multiply pipeline. If \textit{sepsham} is clear, \textit{shlam} gives the constant shift amount, otherwise the separate \textit{shram} shift amount is used. The default is to have \textit{sepsham} clear, so that both left and right shift amounts are the specified by the \textit{shlam} field.

If \textit{shrl} is clear, the right shift is an arithmetic right shift with sign bits shifted in from the left, otherwise it is a logical right shift with zero bits shifted in form the left.

If \textit{shrrn} is set, no rounding is applied to the right shift output. If \textit{shrrn} is clear, \textit{shrst} controls the type of rounding adjustment. A sticky bit value is calculated by OR-ing together all the bits that are shifted off to the right. If \textit{shrrn} is clear and \textit{shrst} is set, this sticky bit value is OR-ed into the least significant bit of the output.

If both \textit{shrrn} and \textit{shrst} are clear, then the least significant bit of the shifter output is AND-ed with the sticky bit value. This last, default, case implements the adjustment required for round-to-even rounding if the left shifter added in a round bit in the \textit{1/2 LSB} position and both left and right shift amounts are the same. When the shift amount is zero, the sticky bit must be zero but no modification should be made to the right shifter output. The hardware includes a check for zero right shift amounts and turns off rounding in this case.

Seventh stage — Clipper
The clipper converts the 33b \textit{SHROUT} value to a 32b result \textit{CLIPOUT}. It also generates a single bit \textit{FLAG} which is OR-ed into the appropriate
Figure 12: T0 fixed point multiply pipeline
bit of the $vsat$ register.

If $noclip$ is clear, the clipper clips the 33b value to an 8b, 16b, or 32b value according to the clip pam field. $SHROUT$ values larger than can be represented in the required number of bits are saturated at the most positive or most negative values possible. The $FLAG$ bit is set if a saturation occurs. This is the normal usage where $vsat$ indicates saturations. Table 11 for details of clip amounts.

If $noclip$ is set, the clipper performs alternate functions — see Table 12 for details. Note that these generate $FLAG$ values which may alter $vsat$.

The $luent$ field must be zero.

**Eighth stage — Conditional Write**

The last stage decides whether to write $CLIPOUT$ to the vector register dependent on the value of $ADDOUT$ and the setting in the $wcond$ field. See Table 13 for details.