rSqrt - Compute reciprocal square-root operation and simulate with latency - Simulink (original) (raw)
Compute reciprocal square-root operation and simulate with latency
Since R2020b
Description
The rSqrt block performs the reciprocal square-root operation on the input data signal. The block has control signals that indicate whether the input and output data are valid. You can also specify the number of iterations of the algorithm and the latency strategy.
To use this block in your Simulink® model, open the HDLMathLib
library by entering this command in the MATLAB® Command Window:
open_system("HDLMathLib")
You can simulate the rSqrt block with latency. For more information, see Latency Considerations.
Examples
Ports
Input
Input signal to calculate the reciprocal square root, specified as a scalar or vector.
Data Types: int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| Boolean
| fixed point
Input control signal that indicates whether the input signal is valid, specified as a scalar.
Data Types: Boolean
Output
Output signal that is the reciprocal square root of the input signal, returned as a scalar or vector.
Data Types: int8
| int16
| int32
| int64
| uint8
| uint16
| uint32
| uint64
| Boolean
| fixed point
Output control signal that indicates whether output signal is valid, returned as a scalar.
Data Types: Boolean
Parameters
Select the architecture for rSqrt block.
Programmatic Use
Block Parameter:architecture |
---|
Type: character vector |
Values:RecipSqrtNewtonSingleRate |
Default:'RecipSqrtNewtonSingleRate' |
Specify the number of iterations for rSqrt algorithm.
Programmatic Use
Block Parameter:numOfIterations |
---|
Type: character vector |
Values: Integer values |
Default: '3' |
Specify whether to use minimum, maximum, custom, or zero latency. For more information, see Latency Strategy.
To use custom latency for the block, set the Latency strategy toCustom
and enter the latency value in the Custom latency field.
You can also control the number of pipeline stages for the iterative algorithm. To customize the latency for iterative algorithm, set theLatency strategy to Custom(PerIteration)
and enter the iterations per pipeline value in the IterationsPerPipeline field. (since R2025a)
Programmatic Use
Block Parameter:latencyMode | |||
---|---|---|---|
Type: character vector | |||
Values: 'Max' |'Min' | 'Custom' | 'Custom(PerIteration)' | 'Zero' |
Default: 'Max' |
When you set Latency strategy toCustom
, use this parameter to specify the custom latency value. The latency must be a nonnegative integer in the range [0, _L_], where L is the maximum latency value of rSqrt block. For more information, see CustomLatency.
Dependency
To use this parameter, set Latency strategy toCustom
.
Programmatic Use
Block Parameter:customLatencyValue |
---|
Type: Integer |
Values: 0 to Max latency |
Default: 0 |
Since R2025a
Specify the iterations to use per each pipeline stage in the algorithm.
Dependency
To enable this parameter, set Latency strategy toCustom(PerIteration)
.
Programmatic Use
Block Parameter:iterationsPerPipelineValue |
---|
Type: Integer |
Values: Positive integer |
Default: 1 |
Specify the output data type. The data type can be inherited or specified directly.
Programmatic Use
Block Parameter:OutDataTypeStr | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Type: character vector | |||||||||||
Values: 'Inherit: Inherit via internal rule' | 'Inherit: Inherit via back propagation' | 'Inherit: Same as first input' | 'int8' | 'uint8' | int16 | 'uint16' | 'int32' | 'uint32' | 'int64' | 'uint64' | fixdt(1,16,0) | '' |
Default: 'Inherit: Inherit via internal rule' |
Action | Reasons for Taking This Action | What Happens for Overflows | Example |
---|---|---|---|
Select this check box. | Your model has possible overflow, and you want explicit saturation protection in the generated code. | Overflows saturate to either the minimum or maximum value that the data type can represent. | The maximum value that the int8 (signed, 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box selected, the block output saturates at 127. Similarly, the block output saturates at a minimum output value of -128. |
Do not select this check box. | You want to optimize efficiency of your generated code.You want to avoid overspecifying how a block handles out-of-range signals. For more information, see Troubleshoot Signal Range Errors. | Overflows wrap to the value that is representable by the data type. | The maximum value that the int8 (signed, 8-bit integer) data type can represent is 127. Any block operation result greater than this maximum value causes overflow of the 8-bit integer. With the check box cleared, the software interprets the overflow-causing value as int8, which can produce an unintended result. For example, a block result of 130 (binary 1000 0010) expressed as int8, is -126. |
When you select this check box, saturation applies to every internal operation on the block, not just the output or result. Usually, the code generation process can detect when overflow is not possible. In this case, the code generator does not produce saturation code.
Programmatic Use
Block Parameter:SaturateOnIntegerOverflow |
---|
Type: character vector |
Value: 'off' |'on' |
Default: 'off' |
Specify the rounding mode for fixed-point operations. For more information, see Rounding Modes.
Programmatic Use
Block Parameter: RndMeth | |||||
---|---|---|---|---|---|
Type: character vector | |||||
Values: 'Ceiling' | 'Convergent' | 'Floor' | 'Nearest' | 'Round' | 'Simplest' | 'Zero' |
Default: 'Floor' |
Algorithms
The rSqrt block is a masked subsystem that contains theLumpLatency
MATLAB Function block. The subsystem uses this MATLAB Function block to compute the latency based on the Number of iterations. To view the function that computes the latency of the block, open the LumpLatency
block in the masked subsystem. To view inside the mask, click the ⇩ icon on the block.
This table shows how the block calculates the latency based on the setting of theLatency strategy parameter:
Latency Strategy | Latency Value (L) |
---|---|
Max | Uses maximum latency by using the equation L = (N * 4) + 5, where N is the value of theNumber of iterations parameter. |
Min | Uses minimum latency by using the equation L = 2 +ceil(((N * 4) - 1) / 3) |
Custom | Specifies a custom latency value. To specify the latency, enter a value between zero and the maximum latency in the Custom latency parameter. For more information, see Custom latency. |
Custom(PerIteration) | Use this setting to control the pipeline stages for the iterative algorithm.Specify the number of pipeline stages per iteration using the IterationsPerPipeline parameter. The block uses the equation L = 1 + ceil((N*4) /K), where K is the value of theIterationsPerPipeline parameter. |
Zero | The latency of the block is 0. |
The rSqrt block uses pipelined architectures to implement the Newton-Raphson-based reciprocal square-root algorithm. By default, the block uses the maximum latency, which depends on the Number of iterations parameter. The block performs a single iteration per pipeline stage. For example, if you set theNumber of iterations to 15
, the latency of the block is 65
based on the maximum latency equation in Latency Considerations. When you increase number of iterations, latency of the block also increases.
You can customize the latency for the iterative algorithm by setting theLatency Strategy to Custom(PerIteration)
, which allows you to control the number of iterations per pipeline stages. For example, if you set the Number of iterations to 15
and you want the block to perform the iterations in three pipeline stages, then set theIterationsPerPipeline to 5
. By using theCustom(PerIteration)
latency strategy, the latency of the block reduces to 13
.
Extended Capabilities
The block supports HDL code generation using HDL Coder™. HDL Coder provides additional configuration options that affect HDL implementation and synthesized logic.
HDL Architecture
Architecture | Description |
---|---|
Module (default) | Generate code for the subsystem and the blocks within the subsystem. |
BlackBox | Generate a black box interface. The generated HDL code includes only the input/output port definitions for the subsystem. Therefore, you can use a subsystem in your model to generate an interface to existing, manually written HDL code. The black-box interface generation for subsystems is similar to the Model block interface generation without the clock signals. |
No HDL | Remove the subsystem from the generated code. You can use the subsystem in simulation, however, treat it as a “no-op” in the HDL code. |
HDL Block Properties
General | |
---|---|
AdaptivePipelining | Automatic pipeline insertion based on the synthesis tool, target frequency, and multiplier word-lengths. The default is inherit. See alsoAdaptivePipelining. |
BalanceDelays | Detects introduction of new delays along one path and inserts matching delays on the other paths. The default is inherit. See also BalanceDelays. |
ClockRatePipelining | Insert pipeline registers at a faster clock rate instead of the slower data rate. The default is inherit. See also ClockRatePipelining. |
ConstrainedOutputPipeline | Number of registers to place at the outputs by moving existing delays within your design. Distributed pipelining does not redistribute these registers. The default is0. For more details, see ConstrainedOutputPipeline. |
DistributedPipelining | Pipeline register distribution, or register retiming. The default is inherit. See also DistributedPipelining. |
DSPStyle | Synthesis attributes for multiplier mapping. The default is none. See also DSPStyle. |
FlattenHierarchy | Remove subsystem hierarchy from generated HDL code. The default is inherit. See also FlattenHierarchy. |
InputPipeline | Number of input pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is0. For more details, see InputPipeline. |
OutputPipeline | Number of output pipeline stages to insert in the generated code. Distributed pipelining and constrained output pipelining can move these registers. The default is0. For more details, see OutputPipeline. |
SharingFactor | Number of functionally equivalent resources to map to a single shared resource. The default is 0. See also Resource Sharing. |
StreamingFactor | Number of parallel data paths, or vectors, that are time multiplexed to transform into serial, scalar data paths. The default is 0, which implements fully parallel data paths. See also Streaming. |
Target Specification
This block cannot be the DUT, so the block property settings in the Target Specification tab are ignored.
Limitations
- The block does not support vector inputs.
- The block does not support bus inputs.
- Cannot be used in Synchronous Subsystem.
- Does not support resource sharing optimization.
Version History
Introduced in R2020b
You can control the pipeline stages for iterative algorithms by setting theLatencyStrategy parameter HDL toCustom(PerIterations)
, then specifying the number of pipeline stages per iteration by using the IterationsPerPipeline parameter. Use this setting to control the pipeline stages in the generated code and optimize the design for speed and resource utilization.