Atomic Memory Operations (AMOs)
Background
The AMOs are special load/store accesses that implements a read-modify-write semantic. A single instruction is able to read a data from the memory, perform an arithmetical/logical operation on that data, and store the result. All this is performed as a single operation (no other operation can come in between the read-modify-write operations).
These operations are meant for synchronization in multi-core environments. To enable this synchronization, AMOs need to be performed on the PoS (Point-of-Serialization), point where all accesses from the different cores converge. This is usually a shared cache memory (when multiple levels of cache are implemented) or the external RAM controllers. Thus, the HPDcache needs to forward these operations to the PoS through the NoC interface.
Supported AMOs
On the interface from requesters, the supported AMOs are the ones listed in Table 13. The supported AMOs are the ones defined in the atomic (A) extension of the RISC-V ISA specification [RISCVUP2019].
Implementation
If \(\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_AMO}\) is set to 0, the HPDcache does not support AMOs from requesters. See Uncacheable Handler for more details.
When a requester performs an AMO operation, the HPDcache needs to forward it to the PoS. It is only at the PoS that AMOs from different caches converge and thus where the correct result may be computed in case of multiple, simultaneous, address-colliding AMOs. In the case a cache has a replica of the target address, as the modification is performed at the PoS, that replica becomes obsolete (cache obsolescence problem). This issue shall be solved through a hardware (or software) cache coherency protocol.
To provide a consistent view of the data, the HPDcache updates the local replica based on the result from the memory. The procedure is as follows: Forward the AMO to the PoS, and wait for the response with the old data. If the target address is replicated in the HPDcache, the HPDcache computes the new value locally based on the data from the AMO request and the old data from the memory (not the data of the local replica). Then it updates the replica. This allows the core to have a consistent view with regards to its operations (single-thread consistency). However, a cache coherency protocol (hardware or software) is still required to ensure coherency in multi-core systems.
The HPDcache handle AMOs as non-allocating operations. This is, AMOs never fetch a replica of the target cacheline from the memory to the cache. If the target cacheline IS NOT replicated in the cache, the AMO modifies ONLY the memory. If the target cacheline IS replicated in the cache, the AMO modifies BOTH the memory and the cache.
AMO ordering
As specified in the RISC-V ISA specification [RISCVUP2019], the base RISC-V ISA has a relaxed memory model. To provide additional ordering constraints, AMOs (including LR/SC) specify two bits, aq and rl, for acquire and release semantics.
The HPDcache ignores aq and rl bits. It considers that they are always set. Hence, HPDcache handles AMOs as sequentially consistent memory operations. The HPDcache waits for all pending read and write operations to complete before serving the AMO request.
This behavior implies that when the HPDcache forwards an AMO to the NoC, it will be the only pending request from the HPDcache. In addition, no new requests from the requesters are served until the AMO is completed.
LR/SC support
LR and SC are part of the Atomic (A) extension of the RISC-V ISA specification [RISCVUP2019]. These instructions allow “complex atomic operations on a single memory word or double-word”.
The HPDcache fully supports all the instructions of the A extension of the RISC-V ISA, including LR and SC operations.
In the specification of these instructions in the RISC-V ISA document, some details are dependent to the implementation. Namely, the size of the reservation set and the return code of a SC failure.
LR/SC reservation set
When a requester executes a LR operation, it “reserves” a set of bytes in memory. This set contains at least the bytes solicited in the request but may contain more. RISC-V ISA defines two sizes for LR operations: 4 bytes or 8 bytes. The HPDcache reserves 8-bytes (double-word) containing the addressed memory location regardless of whether the LR size is 4 or 8 bytes. The start address of the reservation set is a 8-bytes aligned address.
When the LR size is 8 bytes, the address is also aligned to 8 bytes. In this case, the reservation set matches exactly the address interval defined in the request. When the LR size is 4 bytes, there are two possibilities:
-# the target address is not aligned to 8 bytes. The start address of the reservation set contains additional 4 bytes before the target address
-# the target address is aligned to 8 bytes. The reservation set starts at the target address but contains additional 4 bytes after the requested ones.
In summary, in case of LR operation, the reservation set address range is computed as follows:
When a requester executes a SC operation, the HPDcache forwards the operation to the memory ONLY IF the bytes addressed by the SC are part of an active reservation set. If the SC accesses a smaller number of bytes that those in the active reservation set but within that reservation set, the SC is still forwarded to the memory.
After a SC operation, the active reservation set, if any, is invalidated. This is regardless whether the SC operation succeeds or not.
Caution
The HPDcache keeps a unique active reservation set. If multiple requesters perform LR operations, the unique active reservation set is the one specified by the last LR operation.
The HPDcache also invalidates an active reservation set when there is an address-colliding STORE operation. If a STORE access from any requester writes one or more bytes within the active reservation set, the latter is invalidated.
SC failure response code
The RISC-V ISA [RISCVUP2019] specifies that when a SC operation succeeds, the core shall write zero into the destination register of the operation. Otherwise, in case of SC failure, the core shall write a non-zero value into the destination register.
The HPDcache returns the status of an SC operation into the core_rsp_o.rdata
signal of the response interface to requesters. The following table specifies
the values returned by the HPDcache into the core_rsp_o.rdata
signal in case
of SC operation.
Case |
Return value |
---|---|
SC Success |
\(\small\mathsf{0x0000\_0000}\) |
SC Failure |
\(\small\mathsf{0x0000\_0001}\) |
Depending on the specified size in the request (core_req_i.size
), the
returned value is extended with zeros on the most significant bits. This is, if
the SC request size is 8 bytes, and the SC is a failure, then the returned value
is \(\small\mathsf{0x0000\_0000\_0000\_0001}\).
In addition, if the \(\small\mathsf{CONF\_HPDCACHE\_REQ\_DATA\_WIDTH}\) width is wider than the size of the SC request, the return value is replicated \(\small\mathsf{CONF\_HPDCACHE\_REQ\_WORDS}\) times.