F3DZEX2 (original) (raw)

F3DZEX2 is the primary 3D graphics microcode for the N64's Reality Signal Processor (RSP), used exclusively by both Ocarina of Time and Majora's Mask. It is a variant of the F3DEX2 (Fast 3D, Extended 2) microcode, containing a very similar (if not identical) binary interface.

1 Differences from F3DEX2
2 Versions Used
3 Graphics Binary Interface
- 3.1 Triangle Opcodes
4 Vertex Structure
- 4.1 In RAM/ROM
- 4.2 In RCP
5 Matrix Structure
6 RCP Bit Fields
7 DMEM Memory Access
8 Notes

Differences from F3DEX2

The SDK given to all N64 developers includes several microcodes to load onto the RSP, one of which is the F3DEX2 3D graphics microcode (the latest iteration of Nintendo's 3D microcode offerings). The microcode used in N64 Zelda titles appears to be a variation of this microcode, though how exactly is unknown.

The only known difference between the two is that the F3DZEX2 readme encourages the use of the g*SPBranchLessZraw macros over the g*SPBranchLessZ macros (note that the Zraw form is available to developers using F3DEX2, but it's undocumented in the SDK). This doesn't explain the need for a separate compiled microcode, however, since these macros only differ in how a programmer specifies the Z value for the G_BRANCH_Z command.

The readme for F3DZEX2 (as part of the iQue leak) reveals that this microcode required F3DEX2 to be installed first, with the Z-specific microcode object files and a ucode_z.h file installed on top of the F3DEX2 installation afterwards. This header file does not define any new macros or other new features over the standard F3DEX2 microcode, so the need for a Z-specific microcode object is not currently known. It's also not known why the Zraw macro is part of the standard SDK if it was only mentioned to programmers using F3DZEX2, or why the readme doesn't bring up g*SPBranchLessZrg in its discussion of how to branch on the Z value.

Versions Used

The variation used in both OoT and MM is F3DZEX2.NoN.fifo, with "NoN" standing for No Near clipping (objects between the viewer and the near clipping plane are not clipped, and objects in this area aren't subject to Z buffer calculations), and "fifo" simply being a method of transferring commands to the RSP.

Below is the version/copyright strings for F3DZEX2 present in Ocarina of Time and Majora's Mask. Note that the version strings claim the microcode as being "F3DZEX" microcode (which apparently did exist as the counterpart to F3DEX), but the iQue leak shows that the 2 was left out of the version string in F3DZEX2 object files. The Japanese Nintendo 64 versions of Majora's Mask also contain an alternate string that lacks the credit to Kawasedo.

Graphics Binary Interface

This GBI instruction list is currently based on SDK documentation of F3DEX2. Slight variations may be present in F3DZEX.

The below table will link to the right part of the opcode details subpage. Rows highlighted blue are common between different microcodes, at least F3DEX and F3DEX2.

Triangle Opcodes

Several opcodes are named in the SDK source code, but don't get used by any opcode-generating macros. The comments with these opcodes say that they're generated by the microcode itself, and that the programmer will never use (or need) them. The list of named opcodes are:

The source also gives a set of bitmasks to use to construct these opcodes, which as listed are:

G_RDP_TRI_ZBUFF_MASK = 0x01
G_RDP_TRI_TXTR_MASK = 0x02
G_RDP_TRI_SHADE_MASK = 0x04
G_RDP_TRI_FILL_MASK = 0x08

E.g. to get G_TRI_TXTR_ZBUFF, you could do 0xC0 +| G_RDP_TRI_FILL_MASK +| G_RDP_TRI_TXTR_MASK +| G_RDP_TRI_ZBUFF_MASK. Note that the bitmasks would suggest that the range 0xC0..0xC7 could possibly be considered triangle commands as well, however those values are not named. Not even a display list dumper provided by the SDK accounts for those potential opcodes, only accounting for 0xC8..0xCF. So either 0xC0..0xC7 do nothing, or are otherwise not useful without a fill bit set.

Vertex Structure

In RAM/ROM

Outside of the RCP, vertices are stored in the format presented below, depending on whether they're expecting a lighting system or not.

This layout is used for vertices that supply their own color, instead of asking the lighting system to calculate one:

While this is the layout for vertices that wish to use lighting:

In this case, the red, green, and blue components are instead the vertex normal used in lighting calculations. The memory layout looks the same, the decision between which layout belongs to is made by whether or not the RSP is currently set to perform lighting calculations when vertices are being loaded.

The texture coordinates are in a signed fixed-point 10.5 format, in a range of -1024 ≤ n ≤ 1023.96875

In RCP

The following vertex structure documentation is unconfirmed, and is based on documentation for Fast3D.

struct Vertex_RCP { /* 0x00 / int16_t xyzw_int[4]; / 0x08 / uint16_t xyzw_frac[4]; / 0x10 / u8 rgba[4]; // always a color. If lighting is enabled, this is the result of color calculations / 0x14 / s16 s; / 0x16 / s16 t; / 0x18 / s16 xscr; / 0x1A / s16 yscr; / 0x1C / s16 zscr_int; / 0x1E / u16 zscr_frac; / 0x20 / s16 i/w_int; // 1/w / 0x22 / u16 i/w_frac; // 1/w / 0x24 */ ? OcOc ? Flags (e.g clip test results) ? OO }; //size 0x28

Matrix Structure

The matrices used by the RSP are 4x4 matrices in a fixed-point s16.u16 format. The structure is a bit unusual, as shown here:

That is, each matrix element has its integer part at Matrix.intpart[n][m], and its fractional part at Matrix.fracpart[n][m].

The RSP does matrix multiplication with 32-bit numbers (that is, effectively combining the integer and fractional parts of an element), which leads to a 64-bit number. To fit inside a matrix, only the inner 32 bits (lower 16 integer bits, higher 16 fractional bits) are kept. An example of this:

0123.4567

89AB.CDEF

009CA39D.C94E4629 vvvv vvvv A39D.C94E (final result stored in resulting matrix)

Note that the SDK sets up access to the Matrix structure in a different way, as a simple typedef:

typedef int32_t Mtx[4][4];

That is, the integer parts are accesible as Mtx[0..1], and the fractional parts Mtx[2..3], with each matrix element as one half of the int32_t type. All parts are signed here, however it appears the fractional parts are all treated as unsigned. The other structure chosen above was done so that referencing integer and fraction parts will match the "real" index of matrix elements.

RCP Bit Fields

Various opcodes in the microcode configure bitfields that tell the RSP and RDP how to operate on what ultimately will be rendered.

RSP Geometry Mode

This 32-bit value configures how the RSP will process geometry. They are all 1-bit flags. All of these flags are disabled by default except for G_CLIPPING

RDP Other Modes, Lower Half

The lower half of the RDP Other Modes control various aspects of the RDP blender, which mixes new primitives with the framebuffer. The majority of this 32-bit value sets up the render mode for the blender.

Cycle-Independent Blender Settings

The thirteen bits described here are settings for the blender that apply across both cycles in 2-cycle mode. The table in the section above shows where these cycle-independent settings fit into the lower Other Mode settings. The bit values are as follows:

Note that the highest bit of this group no longer means anything.

Cycle-Dependent Blender Settings

The 16 bits that make up the cycle-dependent settings for the RDP blender specify what and how to actually blend things together. The blender runs on the formula

(P * A + M * B) / (A + B)

to blend primitives into the framebuffer. The 16 bits specify the values of these variables for both cycles in the following layout:

Or in other words, each nybble specifies one of the formula parameters, with each nybble half specifying that parameter for one of the two cycles. In 1-cycle mode, both halves must be equal.

The P and M values can be any of the following values:

00 — G_BL_CLR_IN — In the first cycle, parameter is color from input pixel. In the second cycle, parameter is the numerator of the formula as computed for the first cycle.
01 — G_BL_CLR_MEM — Takes color from the framebuffer
10 — G_BL_CLR_BL — Takes color from the blend color register
11 — G_BL_CLR_FOG — Takes color from the fog color register

The A parameter can be set to any of these things:

00 — G_BL_A_IN — Parameter is alpha value of input pixel
01 — G_BL_A_FOG — Alpha value from the fog color register
10 — G_BL_A_SHADE — Calculated alpha value for the pixel, presumably
11 — G_BL_0 — Constant 0.0 value

And the B parameter can be set to any of these:

00 — G_BL_1MA — 1.0 - source alpha
01 — G_BL_A_MEM — Framebuffer alpha value
10 — G_BL_1 — Constant 1.0 value
11 — G_BL_0 — Constant 0.0 value

RDP Other Modes, Higher Half

The higher half of the RDP Other Modes bits controls various aspects of the RDP (whereas the lower half deals more with just the blender). Some of these options differ between two hardware versions; since it's currently not known which version OoT was programmed for, both versions will be accounted for below where there are differences.

Color Combiner Settings

The color combiner mixes colors together with the general formula

(a - b) * c + d

The choice of what a, b, c, and d mean is configured by the programmer, separately for color and alpha. In 2-cycle mode, these calculations can be set differently for the second cycle (1-cycle mode needs the settings to match). In essence, four calculations can be performed:

First/only cycle color = (ca1 - cb1) * cc1 + cd1 First/only cycle alpha = (aa1 - ab1) * ac1 + ad1 Second cycle color = (ca2 - cb2) * cc2 + cd2 Second cycle alpha = (aa2 - ab2) * ac2 + ad2

The configuration of this is done by opcode FC, and the values they take vary based on which parameter you're configuring for. A chart of valid configuration values is below:

Note that, in SDK code, G_CCMUX_0 is actually defined as 0x1F, with the value ANDed to the right number of bits for the particular parameter. The other values the various G_CCMUX_0 options occupy in each parameter are specified in other parts of SDK documentation.

Parameter Description

Here is a list detailing what all the color options above do:

G_CCMUX_COMBINED — Uses the color value calculated in the first cycle. Presumably only for the second cycle of 2-cycle mode, behavior unknown in the first/only cycle.
G_CCMUX_TEXEL0 — Uses the texel colors from the currently-selected tile; either the directly selected tile or the closest tile as chosen by LOD, depending on LOD calculations.
G_CCMUX_TEXEL1 — Uses the texel colors from the next (farther) tile after TEXEL0.
G_CCMUX_PRIMITIVE — Takes color from the primitive color register.
G_CCMUX_SHADE — Uses the color for the current pixel of the primitive, as calculated from vertex colors. That is, flat shading would effectively return the color for the first vertex, and non-flat shading would calculate the color from all the vertices as per Gouraud shading.
G_CCMUX_ENVIRONMENT — Uses color from the environment color register.
G_CCMUX_1 — Uses a constant 1.0 for all color components (red, green, and blue).
G_CCMUX_NOISE — Uses random values for color components.
G_CCMUX_0 — Uses a constant 0.0 for all color components (red, green, and blue).
G_CCMUX_CENTER — Uses the center values set for chroma keying.
G_CCMUX_K4 — Uses constant K4 from the set values for the YUV to RGB conversion process.
G_CCMUX_SCALE — Uses scale values set for chroma keying.
G_CCMUX_COMBINED_ALPHA — Uses alpha value calculated in the first cycle in all color components. Presumably only for the second cycle of 2-cycle mode, behavior unknown in the first/only cycle.
G_CCMUX_TEXEL0_ALPHA — Uses alpha value from the texel of the currently-selected tile in all color components.
G_CCMUX_TEXEL1_ALPHA — Uses alpha value from the texel of the next tile in all color components.
G_CCMUX_PRIMITIVE_ALPHA — Uses alpha value from the primitive color register in all color components.
G_CCMUX_SHADE_ALPHA — Uses alpha value from current pixel of the primitive in all color components.
G_CCMUX_ENV_ALPHA — Uses the alpha value from the environment color register in all color components.
G_CCMUX_LOD_FRACTION — Uses the LOD fraction value specified in the primitive color register for all color components.
G_CCMUX_PRIM_LOD_FRAC — Unclear, used in only one (undocumented) Color Combiner configuration. Possibly the calculated LOD value for the current pixel of the primitive.
G_CCMUX_K5 — Uses constant K5 from the set values for YUV to RGB conversion.

And for alpha options:

G_ACMUX_COMBINED — Uses alpha value calculated in the first cycle. Presumably only for the second cycle of 2-cycle mode, behavior unknown in the first/only cycle.
G_ACMUX_TEXEL0 — Uses alpha value in the texel from the currently-selected tile.
G_ACMUX_TEXEL1 — Uses alpha value in the texel from the next tile after TEXEL0.
G_ACMUX_PRIMITIVE — Uses alpha value from the primitive color register.
G_ACMUX_SHADE — Uses alpha value from the current primitive pixel's calculated RGBA value.
G_ACMUX_ENVIRONMENT — Uses alpha value from the environment color register.
G_ACMUX_1 — Uses a constant 1.0 for the alpha component.
G_ACMUX_0 — Uses a constant 0.0 for the alpha component.
G_ACMUX_LOD_FRACTION — Uses the LOD fraction value from the primitive color register.
G_ACMUX_PRIM_LOD_FRAC — Unclear, used in only one (undocumented) Color Combiner configuration. Possibly the calculated LOD value for the current pixel of the primitive.

DMEM Memory Access

This section describes the parts of the RCP's DMEM that G_MOVEWORD and G_MOVEMEM handle. These commands access memory via a pointer table, so their locations are currently only known via these table indices.

Following is a description of what the table indices for each of those two opcodes mean:

Note that there are more indices defined for G_MOVEMEM, but only the ones listed above are used by any macros.

Notes

↑ Not sure if the range is (0.0, 1.0), [0.0, 1.0), (0.0, 1.0], or [0.0, 1.0]