Parameters, Interfaces and Communication Protocols
Synthesis-time (Static) Configuration Parameters
The HPDcache has several static configuration parameters. These parameters must be defined at compilation/synthesis.
Table 1 summarizes the list of parameters that can be set when integrating the HPDcache.
Parameter |
Description |
---|---|
\(\scriptsize\mathsf{NREQUESTERS}\) |
Number of requesters to the HPDcache |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_PA\_WIDTH}\) |
Physical address width (in bits) |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WORD\_WIDTH}\) |
Width (in bits) of a data word |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_SETS}\) |
Number of sets |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WAYS}\) |
Number of ways (associativity) |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_CL\_WORDS}\) |
Number of words in a cacheline |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_WORDS}\) |
Number of words in the data channels from/to requesters |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_TRANS\_ID\_WIDTH}\) |
Width (in bits) of the transaction ID from requesters |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_SRC\_ID\_WIDTH}\) |
Width (in bits) of the source ID from requesters |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_VICTIM\_SEL}\) |
It allows to choose the replacement selection policy |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_SETS}\) |
Number of sets in the MSHR |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_WAYS}\) |
Number of ways (associativity) in the MSHR |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_DIR\_ENTRIES}\) |
Number of entries in the directory of the write buffer |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_DATA\_ENTRIES}\) |
Number of entries in the data part of the write buffer |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_WORDS}\) |
Number of data words per entry in the write buffer |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_TIMECNT\_WIDTH}\) |
Width (in bits) of the time counter in write buffer entries |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_RTAB\_ENTRIES}\) |
Number of entries in the replay table |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_FLUSH\_ENTRIES}\) |
Number of entries in the flush directory |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_FLUSH\_FIFO\_DEPTH}\) |
Number of entries in the flush FIFO |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_REFILL\_FIFO\_DEPTH}\) |
Number of entries in the refill FIFO |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_REFILL\_CORE\_RSP\_FEEDTHROUGH}\) |
Use feedthrough FIFO for responses from the refill handler to the core |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_MEM\_DATA\_WIDTH}\) |
Width (in bits) of the data channels from/to the memory interface |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_MEM\_ID\_WIDTH}\) |
Width (in bits) of the transaction ID from the memory interface |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WT\_ENABLE}\) |
Enable the write-through policy in the cache |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WB\_ENABLE}\) |
Enable the write-back policy in the cache |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_AMO}\) |
When set to 1, the HPDCache supports Atomic Memory Operations (AMOs) |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_CMO}\) |
When set to 1, the HPDCache supports Cache Management Operations (CMOs) |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_PERF}\) |
When set to 1, the HPDcache integrates performance counters |
Some parameters are not directly related with functionality. Instead, they allow adapting the HPDcache to physical constraints in the target technology node. Typically, these control the geometry of SRAM macros. Depending on the technology, some dimensions are more efficient than others (in terms of performance, power and area). These also need to be provided by the user at synthesis-time. Table 2 lists the static synthesis-time physical parameters of the HPDcache. The \(\scriptsize\mathsf{CONF\_HPDCACHE\_ACCESS\_WORDS}\) has an impact on the refill latency (see section RAM Organization Parameters).
Parameter |
Description |
---|---|
\(\scriptsize\mathsf{CONF\_HPDCACHE\_ACCESS\_WORDS}\) |
Number of words that can be accessed simultaneously from the CACHE data array |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_DATA\_WAYS\_PER\_RAM\_WORD}\) |
Number of ways in the same CACHE data SRAM word |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_DATA\_SETS\_PER\_RAM}\) |
Number of sets per RAM macro in the DATA array of the cache |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_DATA\_RAM\_BYTE\_ENABLE}\) |
Use RAM macros with byte-enable instead of bit-mask for the CACHE data array |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_USE\_REG\_BANK}\) |
Use FFs instead of SRAM for the MSHR |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_WAYS\_PER\_RAM\_WORD}\) |
Number of ways in the same MSHR SRAM word |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_SETS\_PER\_RAM}\) |
Number of sets per RAM macro in the MSHR array of the cache |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_RAM\_BYTE\_ENABLE}\) |
Use RAM macros with byte-enable instead of bit-mask for the MSHR |
Several internal configuration values are computed from the above ones. Table 3 has a non-complete list of these internal configuration values that may be mentioned in the remainder of this document.
Parameter |
Description |
Value |
---|---|---|
\(\scriptsize\mathsf{HPDCACHE\_CL\_WIDTH}\) |
Width (in bits) of a cacheline |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_CL\_WORDS \times}\)
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WORD\_WIDTH}\)
|
\(\scriptsize\mathsf{HPDCACHE\_REQ\_DATA\_WIDTH}\) |
Width (in bits) of request data interfaces |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_WORDS \times}\)
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WORD\_WIDTH}\)
|
\(\scriptsize\mathsf{HPDCACHE\_NLINE\_WIDTH}\) |
Width (in bits) of the cacheline index part of the address |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_PA\_WIDTH -}\)
\(\scriptsize\mathsf{log_2(\frac{HPDCACHE\_CL\_WIDTH}{8})}\)
|
\(\scriptsize\mathsf{HPDCACHE\_SET\_WIDTH}\) |
Width (in bits) of the SET part of the address |
\(\scriptsize\mathsf{log_2(CONF\_HPDCACHE\_SETS)}\) |
\(\scriptsize\mathsf{HPDCACHE\_TAG\_WIDTH}\) |
Width (in bits) of the TAG part of the address |
\(\scriptsize\mathsf{HPDCACHE\_NLINE\_WIDTH -}\)
\(\scriptsize\mathsf{HPDCACHE\_SET\_WIDTH}\)
|
\(\scriptsize\mathsf{HPDCACHE\_WBUF\_WIDTH}\) |
Width (in bits) of an entry in the write-buffer |
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_WORDS \times}\)
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WORD\_WIDTH}\)
|
Conventions
The HPDcache uses the following conventions in the naming of its signals:
The
_i
suffix for input portsThe
_o
suffix for output portsThe
_n
suffix for active low portsThe
clk_
suffix for clock portsThe
rst_
suffix for reset portsThere may be a mix of suffixes. For example
_ni
indicates an active-low input port
Global Signals
Signal |
Source |
Description |
---|---|---|
|
Clock source |
Global clock signal. The HPDcache is synchronous to the rising-edge of the clock. |
|
Reset source |
Global reset signal. Asynchronous, active LOW, reset signal. |
|
System |
Force the write-buffer to send all pending writes. Active HIGH,
one-cycle, pulse signal. Synchronous to |
|
Cache |
Indicates if the write-buffer is empty (there is no pending write transactions). When this signal is set to 1, the write-buffer is empty. |
|
System |
Base address of the CSR segment in the HPDcache (Control-Status Registers (CSRs)) |
Cache-Requesters Interface
This section describes the Cache-Requesters Interface (CRI) between requesters and the HPDcache. It contains two channels: one for requests and one for responses. There are as many CRIs as requesters from the core/accelerator to the HPDcache.
This interface is synchronous to the rising edge of the global
clock clk_i
.
The address (core_req_i.addr_offset
), size (core_req_i.size
),
byte-enable (core_req_i.be
), write data (core_req_i.wdata
) and
read data (core_rsp_o.rdata
) signals shall comply with the alignment
constraints defined in section
Address, data, and byte enable alignment.
CRI Signal Description
Signal |
Source |
Description |
---|---|---|
|
Requester |
Indicates that the corresponding requester has a valid request |
|
Cache |
Indicates that the cache is ready to accept a request from the corresponding requester |
|
Requester |
Least significant bits of the target address of the request |
|
Requester |
Write data (little-endian) |
|
Requester |
Indicates the type of operation to be performed |
|
Requester |
Byte-enable for write data (little-endian) |
|
Requester |
Indicate the size of the access. The size is encoded as the power-of-two of the number of bytes (e.g. 0 is \(\scriptsize\mathsf{2^0~=~1}\), 5 is \(\scriptsize\mathsf{2^5~=~32}\)) |
|
Requester |
The identification tag for the requester. It shall be identical to the index of the request port binded to that requester |
|
Requester |
The identification tag for the request. A requester can issue multiple requests. The corresponding response from the cache will return this tid |
|
Requester |
Indicates if the request needs a response from the cache. When unset, the cache will not issue a response for the corresponding request |
|
Requester |
Indicates wheter the access uses virtual (unset) or physical indexing (set) |
|
Requester |
Most significant bits of the target address of the request. It is only
valid when using physical indexing ( |
|
Requester |
Indicates whether the access needs to be cached (unset) or not (set).
Uncacheable accesses are directly forwarded to the memory. It is only
valid when using physical indexing ( |
|
Requester |
Indicates whether the request targets input/output (IO) peripherals
(set) or not (unset). IO accesses are directly forwarded to the memory.
It is only valid when using physical indexing
( |
|
Requester |
Indicates whether the target cacheline shall be managed as write-back
(write allocate) or write-through (write non-allocate).
It is only valid when using physical indexing
( |
|
Requester |
Most significant bits of the target address of the request. This signal
must be delayed of 1 cycle after
|
|
Requester |
Indicates whether the access needs to be cached (unset) or not (set).
Uncacheable accesses are directly forwarded to the memory. This signal
must be delayed of 1 cycle after
|
|
Requester |
Indicates whether the access targets input/output (IO) peripherals (set)
or not (unset). IO accesses are directly forwarded to the memory. This
signal must be delayed of 1 cycle after
|
|
Requester |
Indicates whether the target cacheline shall be managed as write-back
(write allocate) or write-through (write non-allocate).
It is only valid when using virtual indexing
( |
Signal |
Source |
Description |
---|---|---|
|
Cache |
Indicates that the HPDcache has a valid response for the corresponding requester |
|
Cache |
Response read data |
|
Cache |
The identification tag for the requester. It corresponds to the sid transferred with the request |
|
Cache |
The identification tag for the request. It corresponds to the tid transferred with the request |
|
Cache |
Indicates whether there was an error condition while processing the request |
|
Cache |
Indicates if the request issued in the previous cycle shall be aborted. It is only considered if the previous request used virtual indexing |
Cache Memory Interfaces
This section describes the Cache-Memory Interface (CMI) between the HPDcache and the NoC/memory. It implements 5 different channels.
This interface is synchronous to the rising edge of the global clock
clk_i
.
All CMI interfaces implements the ready-valid protocol described in section Valid/Ready handshake process for the handshake between the HPDcache and the NoC/Memory.
The address (mem_req_addr
), size (mem_req_size
),
write data (mem_req_w_data
) and write byte-enable (mem_req_w_be
)
signals shall comply with the alignment constraints defined in section
Address, data, and byte enable alignment.
CMI Signal Descriptions
Memory Read Interfaces
Signal |
Source |
Description |
---|---|---|
|
Cache |
Indicates that the channel is signaling a valid request |
|
NoC |
Indicates that the NoC is ready to accept a request |
|
Cache |
Target physical address of the request. The address shall be aligned to
the |
|
Cache |
Indicates the number of transfers in a burst minus one |
|
Cache |
Indicate the size of the access. The size is encoded as the power-of-two of the number of bytes |
|
Cache |
The identification tag for the request. The HPDcache always use unique IDs on the memory interface (i.e. two or more in-flight requests cannot share the same ID). |
|
Cache |
Indicates the type of operation to be performed |
|
Cache |
In case of atomic operations, it indicates its type |
|
Cache |
This is a hint for the cache hierarchy in the system. It indicates if the request can be allocated by the cache hierarchy. That is, data can be prefetched from memory or can be reused for multiple read transactions |
Signal |
Source |
Description |
---|---|---|
|
NoC |
Indicates that the channel is signaling a valid response |
|
Cache |
Indicates that the cache is ready to accept a response |
|
NoC |
Indicates whether there was an error condition while processing the request |
|
NoC |
The identification tag for the request. It corresponds to the ID transferred with the request |
|
NoC |
Response read data. It shall be naturally aligned to the request address |
|
NoC |
Indicates the last transfer in a read response burst |
Memory Write Interfaces
Signal |
Source |
Description |
---|---|---|
|
Cache |
Indicates that the channel is signaling a valid request |
|
NoC |
Indicates that the cache is ready to accept a response |
|
Cache |
Target physical address of the request |
|
Cache |
Indicates the number of transfers in a burst minus one |
|
Cache |
Indicate the size of the access. The size is encoded as the power-of-two of the number of bytes |
|
Cache |
The identification tag for the request. The HPDcache always use unique IDs on the memory interface (i.e. two or more in-flight requests cannot share the same ID). |
|
Cache |
Indicates the type of operation to be performed |
|
Cache |
In case of atomic operations, it indicates its type |
|
Cache |
This is a hint for the cache hierarchy in the system. It indicates if the write is bufferable by the cache hierarchy. This means that the write must be visible in a timely manner at the final destination. However, write responses can be obtained from an intermediate point |
Signal |
Source |
Description |
---|---|---|
|
Cache |
Indicates that the channel is transferring a valid data |
|
NoC |
Indicates that the target is ready to accept the data |
|
Cache |
Request write data. It shall be naturally aligned to the request address |
|
Cache |
Request write byte-enable. It shall be naturally aligned to the request address |
|
Cache |
Indicates the last transfer in a write request burst |
Signal |
Source |
Description |
---|---|---|
|
NoC |
Indicates that the channel is transferring a valid write acknowledgement |
|
Cache |
Indicates that the cache is ready to accept the acknowledgement |
|
NoC |
Indicates whether the atomic operation was successfully processed (atomically) |
|
NoC |
Indicates whether there was an error condition while processing the request |
|
NoC |
The identification tag for the request. It corresponds to the ID transferred with the request |
Interfaces’ requirements
This section describes the basic protocol transaction requirements for the different interfaces in the HPDcache.
Valid/Ready handshake process
All interfaces in the HPDcache use a valid/ready handshake process to transfer a payload between the source and the destination. The payload contains the address, data and control information.
As a reminder, the 7 interfaces in the HPDcache are the following:
CRI request interface
CRI response interface
CMI read request interface
CMI read response interface
CMI write request interface
CMI write data request interface
CMI write response interface
The source sets to 1 the valid signal to indicate when the payload is available. The destination sets to 1 the ready signal to indicate that it can accept that payload. Transfer occurs only when both the valid and ready signals are set to 1 on the next rising edge of the clock.
A source is not permitted to wait until ready is set to 1 before setting valid to 1.
A destination may or not wait for valid to set the ready to 1 (cases (a) and (d) in Table 12). In other words, a destination may set ready to 1 before an actual transfer is available.
When valid is set to 1, the source must keep it that way until the handshake occurs. This is, at the next rising edge when both valid and ready (from the destination) are set to 1. In other words, a source cannot retire a pending valid transfer (Case (b) in Table 12).
After an effective transfer (valid and ready set to 1), the source may keep valid set to 1 in the next cycle to signal a new transfer (with a new payload). In the same manner, the destination may keep ready set to 1 if it can accept a new transfer. This allows back-to-back transfers, with no idle cycles, between a source and a destination (Case (d) in Table 12).
All interfaces are synchronous to the rising edge of the same global
clock (clk_i
).
(a) |
(b) |
(c) |
(d) |
CRI Response Interface
In the case of the CRI response interfaces, there is a particularity. For these interfaces, it is assumed that the ready signal is always set to 1. That is why the ready signal is not actually implemented on those interfaces. In other words, the requester unconditionally accepts any incoming response.
Address, data and byte enable alignment
Address alignment
The address transferred (addr) in all request interfaces (CRI and CMI) shall be byte-aligned to the value of the corresponding size signal in that interface.
Some examples are illustrated in Figure 2. In the first case, the size value is 2 (which corresponds to \(\scriptsize\mathsf{2^2=4}\) bytes). Thus, the address must be a multiple of 4; In the second case, size value is 3. Thus, the address must be a multiple of 8. Finally, in the third case, size value is 0. Thus, there is no constraint on the address alignment.
Data alignment
The data must be naturally aligned to the address (addr) and the maximum valid bytes of the transfer must be equal to \(\scriptsize\mathsf{2^{size}}\). This means that the first valid byte in the data signal must be at the indicated offset of the address. Here, the offset corresponds to the least significant bits of the address, that allow to indicate a byte within the data word. For example, if the data signal is 128 bits wide (16 bytes), then the offset corresponds to the first 4 bits of the addr signal.
Some examples are illustrated in Figure 2. As illustrated, within the data word, only bytes in the range from the indicated offset in the address, to that offset plus \(\scriptsize\mathsf{2^{size}}\) can contain valid data. Other bytes must be ignored by the destination.
Additionally, within the range described above, the be signal indicates which bytes within that range are actually valid. Bytes in the data signal where the be signals are set to 0, must be ignored by the destination.
Byte Enable (BE) alignment
The be signal must be naturally aligned to the address (addr) and the number of bits set in this signal must be less or equal to \(\scriptsize\mathsf{2^\text{size}}\). This means that the first valid bit in the be signal must be at the indicated offset of the address. The offset is the same as the one explained above in the “Data alignment” paragraph.
Some examples are illustrated in Figure 2. As illustrated, within the be word, only bits in the range from the indicated offset in the address, to that offset plus \(\scriptsize\mathsf{2^{size}}\) can be set. Other bits outside that range must be set to 0.
Fig. 2 Address, Data and Byte Enable Alignment in Requests
Cache-Requesters Interface (CRI) Attributes
Physical or Virtual Indexing
The HPDcache allows the address and physical memory attributes (PMA) to be sent by the requesters in two different (but consecutive) cycles.
This is useful to allow the pipelining of the address translation mechanism (when the core has one). This is illustrated in Figure 3. Doing the translation and directly forwarding to the cache is usually too costly in terms of timing. Instead, the requesters can:
Cycle 0 |
During the first cycle, forward the least significant bits of the
address ( |
Cycle 1 |
During the second cycle, forward the previously translated most
significant bits of the address ( |
Fig. 3 Pipelining of the Virtual and Physical Part of the Address
This kind of indexing is named Virtually-Indexed Physically-Tagged (VIPT).
The requester shall send the tag and PMAs the next cycle after the
core_req_valid_i
and core_req_ready_o
signals were set to 1 and the
core_req_i.phys_indexed
signal was set to 0.The number of bits of the
address offset (addr_offset
) depends on the number of cache sets
(\(\scriptsize\mathsf{CONF\_HPDCACHE\_SETS}\)) and the size of the cachelines
(\(\scriptsize\mathsf{CONF\_HPDCACHE\_CL\_WIDTH/8}\)).
The address offset represents the concatenation of these two fields of the
address: the byte offset in the cacheline and the set index. Requests can be
sent back-to-back with no idle cycle in-between.
If requesters do not need virtual indexing, they can send the full address in
the first cycle by setting the core_req_i.phys_indexed
bit to 1. The address
offset and the tag shall be sent through the core_req_i.addr_offset
and
core_req_i.addr_tag
, respectively. A given requester is free to alternate
between virtual and physical indexing on different clock cycles. Different
requesters can use different indexing schemes (virtual or physical).
Request Abortion
When using the virtual indexing, the requester can abort the request during the second cycle of the addressing pipeline. In that case, the requester needs to set the req_abort signal to 1.
When a request is aborted, and the core_req_i.need_rsp
field was set to 1, the
HPDcache respond to the corresponding requester with the bit
core_rsp_o.aborted
set to 1.
CRI Type of Operation
A requester indicates the required operation on the 5-bit, HPDCACHE_REQ_OP
signal. The supported operation are detailed in Table 13.
Mnemonic |
Encoding |
Type |
---|---|---|
|
0b00000 |
Read operation |
|
0b00001 |
Write operation |
|
0b00100 |
Atomic Load-reserved operation |
|
0b00101 |
Atomic Store-conditional operation |
|
0b00110 |
Atomic SWAP operation |
|
0b00111 |
Atomic integer ADD operation |
|
0b01000 |
Atomic bitwise AND operation |
|
0b01001 |
Atomic bitwise OR operation |
|
0b01010 |
Atomic bitwise XOR operation |
|
0b01011 |
Atomic integer signed MAX operation |
|
0b01100 |
Atomic integer unsigned MAX operation |
|
0b01101 |
Atomic integer signed MIN operation |
|
0b01110 |
Atomic integer unsigned MIN operation |
|
0b10000 |
Memory write fence |
|
0b10001 |
Prefetch a cacheline given its address |
|
0b10010 |
Invalidate a cacheline given its address |
|
0b10011 |
Invalidate all Cachelines |
|
0b10100 |
Flush a cacheline given its Address |
|
0b10101 |
Flush All Cachelines |
|
0b10110 |
Flush and invalidate a cacheline given its address |
|
0b10111 |
Flush and invalidate all cachelines |
Load and store operations are normal read and write operations from/to the specified address.
Atomic operations are the ones specified in the Atomic (A) extension of the [RISCVUP2019]. More details on how the HPDcache implements AMOs are found in section Atomic Memory Operations (AMOs).
CMOs are explained in Cache Management Operations (CMOs).
Source identifier
Each request identifies its source through the core_req_i.sid
signal. The
core_req_i.sid
signal shall be decoded when the core_req_valid_i
signal
is set to 1. The width of this signal is
\(\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_SRC\_ID\_WIDTH}\) bits.
The HPDcache reflects the value of the sid of the request into the
corresponding sid of the response.
Each port must have a unique ID that corresponds to its number. Each port is numbered from 0 to N-1. This number shall be constant for a given port (requester). The HPDcache uses this information to route responses to the correct requester.
Transaction identifier
Each request identifies transactions through the
core_req_i.tid
signal. The
core_req_i.tid
signal shall be decoded when the
core_req_valid_i
signal is set to 1. The width of this signal is
\(\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_TRANS\_ID\_WIDTH}\) bits.
This signal can contain any value from 0 to \(\scriptsize\mathsf{2^{CONF\_HPDCACHE\_REQ\_TRANS\_ID\_WIDTH} - 1}\). The HPDcache forwards the value of the tid of the request into the tid of the corresponding response.
A requester can issue multiple transactions without waiting for earlier transactions to complete. Because the HPDcache can respond to these transactions in a different order than the one of requests, the requester can use the tid to match the responses with respect to requests.
The ID of transactions is not necessarily unique. A requester may reuse a given transaction ID for different transactions. That is, even when some of these transactions are not yet completed. However, when the requester starts multiple transactions with the same tid, it cannot match responses and requests because responses can be in a different order that the one of requests.
Cacheability
This cache considers that the memory space is segmented. A segment corresponds to an address range: a base address and an end address. Some segments are cacheable and others not. The HPDcache needs to know which segments are cacheable to determine if for a given read request, it needs to copy the read data into the cache.
The request interface implements an uncacheable bit
(core_req_i.pma.uncacheable
or core_req_pma_i.uncacheable
). When this
bit is set, the access is considered uncacheable. The
core_req_i.pma.uncacheable
signal shall be decoded when the
core_req_valid_i
signal is set to 1. The core_req_pma_i.uncacheable
shall be decoded when the core_req_valid_i
, core_req_ready_o
and the
core_req_i.phys_indexed
signals were set to 1 the previous cycle.
Caution
For a given address, the uncacheable attribute must be consistent between accesses. The granularity is the cacheline. In the event that the same address is accessed with different values in the uncacheable attribute, the behavior of the cache for that address is unpredictable.
Need response
For any given request, a requester can set the bit core_req_i.need_rsp
to 0
to indicate that it does not want a response for that request. The
core_req_i.need_rsp
signal shall be decoded when the core_req_valid_i
signal is set to 1.
When core_req_i.need_rsp
is set to 0, the HPDcache processes the request
but it does not send an acknowledgement to the corresponding requester when the
transaction is completed.
Write-Policy Hint
The CRI may set dynamically the write-policy (write-back or write-through) for the target cacheline. In the request interface, there are specific flags (hint) to indicate the desired policy for a given request.
The request interface drives the hint through the
core_req_i.pma.wr_policy_hint
or core_req_pma_i.wr_policy_hint
signals.
The core_req_i.pma.wr_policy_hint
signal shall be decoded when the
core_req_valid_i
signal is set to 1. The core_req_pma_i.wr_policy_hint
shall be decoded when the core_req_valid_i
, core_req_ready_o
and the
core_req_i.phys_indexed
signals were set to 1 the previous cycle.
The supported hints are detailed in Table 14.
Mnemonic |
Encoding |
Type |
---|---|---|
|
0b001 |
Request to to keep the current write-policy for the target cacheline if there is a copy in the cache, or use the default policy otherwise. |
|
0b010 |
Request a write-back (write allocate) policy for the target cacheline |
|
0b100 |
Request a write-through (write non-allocate) policy for the target cacheline |
Error response
The response interface contains a single-bit core_rsp_o.error
signal. This
signal is set to 1 by the HPDcache when some error condition occurred during the
processing of the corresponding request. The core_rsp_o.error
signal shall
be decoded when the core_rsp_valid_o
signal is set to 1.
When the core_rsp_o.error
signal is set to 1 in the response, the effect of
the corresponding request is undefined. If this error signal is set in the
case of LOAD or AMOs operations, the rdata signal does not contain
any valid data.
Cache-Memory Interface (CMI) Attributes
CMI Type of operation
Mnemonic |
Encoding |
Type |
---|---|---|
|
0b00 |
Read operation |
|
0b01 |
Write operation |
|
0b10 |
Atomic operation |
HPDCACHE_MEM_READ
and HPDCACHE_MEM_WRITE
are respectively normal read
and write operations from/to the specified address.
In case of an atomic operation request (HPDCACHE_MEM_ATOMIC
), the specific
operation is specified in the MEM_REQ_ATOMIC
signal. These operations are
listed in Table 16. Note that these
operations are compatible with the ones defined in the AMBA AXI prototol.
Mnemonic |
Encoding |
Type |
---|---|---|
|
0b0000 |
Atomic fetch-and-add operation |
|
0b0001 |
Atomic fetch-and-clear operation |
|
0b0010 |
Atomic fetch-and-set operation |
|
0b0011 |
Atomic fetch-and-exclusive-or operation |
|
0b0100 |
Atomic fetch-and-maximum (signed) operation |
|
0b0101 |
Atomic fetch-and-minimum (signed) operation |
|
0b0110 |
Atomic fetch-and-maximum (unsigned) operation |
|
0b0111 |
Atomic fetch-and-minimum (unsigned) operation |
|
0b1000 |
Atomic swap operation |
|
0b1100 |
Load-exclusive operation |
|
0b1101 |
Store-exclusive operation |
Type of operation per CMI request channel
As a reminder, the HPDcache implements two request channels to the memory:
Memory read request channel
Memory write request channel
Table 17 indicates the type of operations that each of these two request channels can issue.
Type |
Channels |
---|---|
|
|
|
|
|
|
Read-Modify-Write Atomic Operations
The following atomic operations behave as read-modify-write operations:
HPDCACHE_MEM_ATOMIC_ADD
HPDCACHE_MEM_ATOMIC_CLR
HPDCACHE_MEM_ATOMIC_SET
HPDCACHE_MEM_ATOMIC_EOR
HPDCACHE_MEM_ATOMIC_SMAX
HPDCACHE_MEM_ATOMIC_SMIN
HPDCACHE_MEM_ATOMIC_UMAX
HPDCACHE_MEM_ATOMIC_UMIN
HPDCACHE_MEM_ATOMIC_SWAP
These requests are forwarded to the memory through the CMI write request interface. A particularity of these requests is that they generate two responses from the memory:
Old data value from memory is returned through the CMI read response interface.
Write acknowledgement is returned through the CMI write response interface.
Both responses may arrive in any given order to the initiating HPDcache.
Regarding errors, if any response has its error signal set to 1
(mem_resp_*_i.mem_resp_r_error
or mem_resp_*_i.mem_resp_w_error
), the
HPDcache considers that the operation was not completed. It waits for both
responses and it forwards an error response (core_rsp_o.error = 1
) to the
corresponding requester on the HPDcache requesters’ side.
Exclusive Load/Store Atomic Operations
Exclusive load and store operations are issued as normal load and store operations on the CMI read request interface and CMI write request interface, respectively.
Specific operation types are however used on these exclusive requests:
HPDCACHE_MEM_ATOMIC_LDEX
for loads; and
HPDCACHE_MEM_ATOMIC_STEX
for stores.
These requests behave similarly to normal load and store to the memory but provide some additional properties described in Atomic Memory Operations (AMOs).
In the case of the HPDCACHE_MEM_ATOMIC_STEX
request, the write
acknowledgement contains an additional information in the
mem_resp_w_is_atomic
signal.
If this signal is set to 1, the exclusive store was “atomic”, hence the data was
written in memory.
If this signal is set to 0, the exclusive store was “non-atomic”. Hence the
write operation was abandoned.
The HPDcache uses exclusive stores in case of SC operations from requesters.
Depending on the mem_resp_w_is_atomic
value, the HPDcache responds to the
requester according to the rules explained in Atomic Memory Operations (AMOs). A “non-atomic”
response is considered a SC Failure, and a “atomic” response is considered a
SC Success.
CMI Transaction identifier
Each request identifies transactions through the mem_req_*_o.mem_req_id
signals. The mem_req_*_o.mem_req_id
signal shall be decoded when the
mem_req_*_valid_o
signal is set to 1. The width of these ID signals is
\(\scriptsize\mathsf{CONF\_HPDCACHE\_MEM\_ID\_WIDTH}\) bits.
The target (memory or peripheral) shall respond to a request by setting the
mem_resp*_i.mem_resp_*_id
signal to the corresponding
mem_req*_i.mem_req_id
.
mem_req_*_o.mem_req_id
signals can contain any value from 0 to
\(\scriptsize\mathsf{2^CONF\_HPDCACHE\_MEM\_ID\_WIDTH - 1}\).
The HPDcache can issue multiple memory transactions without waiting for earlier transactions to complete. The HPDcache uses unique IDs for each request. Unique IDs means that two or more in-flight requests never share the same ID. In-flight requests are those that have been issued by the HPDcache but have not yet received their respective response.
The target (memory or peripheral) of the in-flight request may respond to CMI in-flight requests in any order.
Transaction IDs in the CMI read request channel
The HPDcache can have the following number of in-flight read miss transactions:
\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_SETS{}\times{}CONF\_HPDCACHE\_MSHR\_WAYS}\)
Each in-flight transaction has a unique transaction ID. This ID is formatted as follows:
For cacheable requests:
(mshr_way << log2(HPDCACHE_MSHR_SETS)) | mshr_set
The ID is the concatenation of two indexes: the MSHR set and the MSHR way occupied by the corresponding request.
For uncacheable requests
The HPDcache can issue up to 1 in-flight, uncached, read transaction. Uncached transactions have a unique transaction ID with all bits set to 1.
Transaction IDs in the CMI wbuf write request channel
The HPDcache can have the following number of in-flight write transactions:
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_DIR\_ENTRIES}\)
Each in-flight transaction has a unique transaction ID. This ID is formatted as follows:
For cacheable requests:
The ID corresponds to the index of the entry in the write-buffer directory.
wbuf_dir_index
For uncacheable requests
The HPDcache can issue up to 1 in-flight, uncached, write transaction. Uncached transactions have a unique transaction ID with all bits set to 1.
Event signals
In addition to the performance registers explained in Performance counters, the HPDcache provides a set of one-shot signals that indicate when a given event is detected. These signals are set to 1 for one cycle each time the corresponding event is detected. If the same event is detected N cycles in a row, the corresponding event signal will remain set to 1 for N cycles. Table 18 lists these event signals.
These event signals are output-only. They can be either left unconnected, if they are not used, or connected with the remainder of the system. The system can use those signals, for example, for counting those events externally or for triggering some specific actions.
Signal |
Source |
Description |
---|---|---|
|
Cache |
Write request accepted |
|
Cache |
Read request accepted |
|
Cache |
Prefetch request accepted |
|
Cache |
Uncached request accepted |
|
Cache |
CMO request accepted |
|
Cache |
One request accepted (any type) |
|
Cache |
Write miss event |
|
Cache |
Read miss event |
|
Cache |
Request put on-hold in the RTAB |
|
Cache |
Request put on-hold because of a MSHR conflict |
|
Cache |
Request put on-hold because of a WBUF conflict |
|
Cache |
Request put on-hold (again) after a rollback |
|
Cache |
Cache stalls request event |