Parameters, Interfaces and Communication Protocols

Synthesis-time (Static) Configuration Parameters

The HPDcache has several static configuration parameters. These parameters must be defined at compilation/synthesis.

Table 1 summarizes the list of parameters that can be set when integrating the HPDcache.

Table 1 Static Synthesis-Time Parameters

Parameter

Description

\(\scriptsize\mathsf{NREQUESTERS}\)

Number of requesters to the HPDcache

\(\scriptsize\mathsf{CONF\_HPDCACHE\_PA\_WIDTH}\)

Physical address width (in bits)

\(\scriptsize\mathsf{CONF\_HPDCACHE\_WORD\_WIDTH}\)

Width (in bits) of a data word

\(\scriptsize\mathsf{CONF\_HPDCACHE\_SETS}\)

Number of sets

\(\scriptsize\mathsf{CONF\_HPDCACHE\_WAYS}\)

Number of ways (associativity)

\(\scriptsize\mathsf{CONF\_HPDCACHE\_CL\_WORDS}\)

Number of words in a cacheline

\(\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_WORDS}\)

Number of words in the data channels from/to requesters

\(\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_TRANS\_ID\_WIDTH}\)

Width (in bits) of the transaction ID from requesters

\(\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_SRC\_ID\_WIDTH}\)

Width (in bits) of the source ID from requesters

\(\scriptsize\mathsf{CONF\_HPDCACHE\_VICTIM\_SEL}\)

It allows to choose the replacement selection policy

\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_SETS}\)

Number of sets in the MSHR

\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_WAYS}\)

Number of ways (associativity) in the MSHR

\(\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_DIR\_ENTRIES}\)

Number of entries in the directory of the write buffer

\(\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_DATA\_ENTRIES}\)

Number of entries in the data part of the write buffer

\(\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_WORDS}\)

Number of data words per entry in the write buffer

\(\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_TIMECNT\_WIDTH}\)

Width (in bits) of the time counter in write buffer entries

\(\scriptsize\mathsf{CONF\_HPDCACHE\_RTAB\_ENTRIES}\)

Number of entries in the replay table

\(\scriptsize\mathsf{CONF\_HPDCACHE\_FLUSH\_ENTRIES}\)

Number of entries in the flush directory

\(\scriptsize\mathsf{CONF\_HPDCACHE\_FLUSH\_FIFO\_DEPTH}\)

Number of entries in the flush FIFO

\(\scriptsize\mathsf{CONF\_HPDCACHE\_REFILL\_FIFO\_DEPTH}\)

Number of entries in the refill FIFO

\(\scriptsize\mathsf{CONF\_HPDCACHE\_REFILL\_CORE\_RSP\_FEEDTHROUGH}\)

Use feedthrough FIFO for responses from the refill handler to the core

\(\scriptsize\mathsf{CONF\_HPDCACHE\_MEM\_DATA\_WIDTH}\)

Width (in bits) of the data channels from/to the memory interface

\(\scriptsize\mathsf{CONF\_HPDCACHE\_MEM\_ID\_WIDTH}\)

Width (in bits) of the transaction ID from the memory interface

\(\scriptsize\mathsf{CONF\_HPDCACHE\_WT\_ENABLE}\)

Enable the write-through policy in the cache

\(\scriptsize\mathsf{CONF\_HPDCACHE\_WB\_ENABLE}\)

Enable the write-back policy in the cache

\(\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_AMO}\)

When set to 1, the HPDCache supports Atomic Memory Operations (AMOs)

\(\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_CMO}\)

When set to 1, the HPDCache supports Cache Management Operations (CMOs)

\(\scriptsize\mathsf{CONF\_HPDCACHE\_SUPPORT\_PERF}\)

When set to 1, the HPDcache integrates performance counters

Some parameters are not directly related with functionality. Instead, they allow adapting the HPDcache to physical constraints in the target technology node. Typically, these control the geometry of SRAM macros. Depending on the technology, some dimensions are more efficient than others (in terms of performance, power and area). These also need to be provided by the user at synthesis-time. Table 2 lists the static synthesis-time physical parameters of the HPDcache. The \(\scriptsize\mathsf{CONF\_HPDCACHE\_ACCESS\_WORDS}\) has an impact on the refill latency (see section RAM Organization Parameters).

Table 2 Static Synthesis-Time Physical Parameters

Parameter

Description

\(\scriptsize\mathsf{CONF\_HPDCACHE\_ACCESS\_WORDS}\)

Number of words that can be accessed simultaneously from the CACHE data array

\(\scriptsize\mathsf{CONF\_HPDCACHE\_DATA\_WAYS\_PER\_RAM\_WORD}\)

Number of ways in the same CACHE data SRAM word

\(\scriptsize\mathsf{CONF\_HPDCACHE\_DATA\_SETS\_PER\_RAM}\)

Number of sets per RAM macro in the DATA array of the cache

\(\scriptsize\mathsf{CONF\_HPDCACHE\_DATA\_RAM\_BYTE\_ENABLE}\)

Use RAM macros with byte-enable instead of bit-mask for the CACHE data array

\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_USE\_REG\_BANK}\)

Use FFs instead of SRAM for the MSHR

\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_WAYS\_PER\_RAM\_WORD}\)

Number of ways in the same MSHR SRAM word

\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_SETS\_PER\_RAM}\)

Number of sets per RAM macro in the MSHR array of the cache

\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_RAM\_BYTE\_ENABLE}\)

Use RAM macros with byte-enable instead of bit-mask for the MSHR

Several internal configuration values are computed from the above ones. Table 3 has a non-complete list of these internal configuration values that may be mentioned in the remainder of this document.

Table 3 Internal Parameters

Parameter

Description

Value

\(\scriptsize\mathsf{HPDCACHE\_CL\_WIDTH}\)

Width (in bits) of a cacheline

\(\scriptsize\mathsf{CONF\_HPDCACHE\_CL\_WORDS \times}\)
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WORD\_WIDTH}\)

\(\scriptsize\mathsf{HPDCACHE\_REQ\_DATA\_WIDTH}\)

Width (in bits) of request data interfaces

\(\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_WORDS \times}\)
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WORD\_WIDTH}\)

\(\scriptsize\mathsf{HPDCACHE\_NLINE\_WIDTH}\)

Width (in bits) of the cacheline index part of the address

\(\scriptsize\mathsf{CONF\_HPDCACHE\_PA\_WIDTH -}\)
\(\scriptsize\mathsf{log_2(\frac{HPDCACHE\_CL\_WIDTH}{8})}\)

\(\scriptsize\mathsf{HPDCACHE\_SET\_WIDTH}\)

Width (in bits) of the SET part of the address

\(\scriptsize\mathsf{log_2(CONF\_HPDCACHE\_SETS)}\)

\(\scriptsize\mathsf{HPDCACHE\_TAG\_WIDTH}\)

Width (in bits) of the TAG part of the address

\(\scriptsize\mathsf{HPDCACHE\_NLINE\_WIDTH -}\)
\(\scriptsize\mathsf{HPDCACHE\_SET\_WIDTH}\)

\(\scriptsize\mathsf{HPDCACHE\_WBUF\_WIDTH}\)

Width (in bits) of an entry in the write-buffer

\(\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_WORDS \times}\)
\(\scriptsize\mathsf{CONF\_HPDCACHE\_WORD\_WIDTH}\)

Conventions

The HPDcache uses the following conventions in the naming of its signals:

  • The _i suffix for input ports

  • The _o suffix for output ports

  • The _n suffix for active low ports

  • The clk_ suffix for clock ports

  • The rst_ suffix for reset ports

  • There may be a mix of suffixes. For example _ni indicates an active-low input port

Global Signals

Table 4 Global Signals

Signal

Source

Description

clk_i

Clock source

Global clock signal. The HPDcache is synchronous to the rising-edge of the clock.

rst_ni

Reset source

Global reset signal. Asynchronous, active LOW, reset signal.

wbuf_flush_i

System

Force the write-buffer to send all pending writes. Active HIGH, one-cycle, pulse signal. Synchronous to clk_i.

wbuf_empty_o

Cache

Indicates if the write-buffer is empty (there is no pending write transactions). When this signal is set to 1, the write-buffer is empty.

cfig_base_i

System

Base address of the CSR segment in the HPDcache (Control-Status Registers (CSRs))

Cache-Requesters Interface

This section describes the Cache-Requesters Interface (CRI) between requesters and the HPDcache. It contains two channels: one for requests and one for responses. There are as many CRIs as requesters from the core/accelerator to the HPDcache.

This interface is synchronous to the rising edge of the global clock clk_i.

The address (core_req_i.addr_offset), size (core_req_i.size), byte-enable (core_req_i.be), write data (core_req_i.wdata) and read data (core_rsp_o.rdata) signals shall comply with the alignment constraints defined in section Address, data, and byte enable alignment.

CRI Signal Description

Table 5 CRI Request Channel Signals

Signal

Source

Description

core_req_valid_i

Requester

Indicates that the corresponding requester has a valid request

core_req_ready_o

Cache

Indicates that the cache is ready to accept a request from the corresponding requester

core_req_i.addr_offset

Requester

Least significant bits of the target address of the request

core_req_i.wdata

Requester

Write data (little-endian)

core_req_i.op

Requester

Indicates the type of operation to be performed

core_req_i.be

Requester

Byte-enable for write data (little-endian)

core_req_i.size

Requester

Indicate the size of the access. The size is encoded as the power-of-two of the number of bytes (e.g. 0 is \(\scriptsize\mathsf{2^0~=~1}\), 5 is \(\scriptsize\mathsf{2^5~=~32}\))

core_req_i.sid

Requester

The identification tag for the requester. It shall be identical to the index of the request port binded to that requester

core_req_i.tid

Requester

The identification tag for the request. A requester can issue multiple requests. The corresponding response from the cache will return this tid

core_req_i.need_rsp

Requester

Indicates if the request needs a response from the cache. When unset, the cache will not issue a response for the corresponding request

core_req_i.phys_indexed

Requester

Indicates wheter the access uses virtual (unset) or physical indexing (set)

core_req_i.addr_tag

Requester

Most significant bits of the target address of the request. It is only valid when using physical indexing (core_req_i.phys_indexed = 1)

core_req_i.pma.uncacheable

Requester

Indicates whether the access needs to be cached (unset) or not (set). Uncacheable accesses are directly forwarded to the memory. It is only valid when using physical indexing (core_req_i.phys_indexed = 1)

core_req_i.pma.io

Requester

Indicates whether the request targets input/output (IO) peripherals (set) or not (unset). IO accesses are directly forwarded to the memory. It is only valid when using physical indexing (core_req_i.phys_indexed = 1)

core_req_i.pma.wr_policy_hint

Requester

Indicates whether the target cacheline shall be managed as write-back (write allocate) or write-through (write non-allocate). It is only valid when using physical indexing (core_req_i.phys_indexed = 1)

core_req_tag_i

Requester

Most significant bits of the target address of the request. This signal must be delayed of 1 cycle after (core_req_valid_i & core_req_ready_o) = 1. It is valid when using virtual indexing (core_req_i.phys_indexed = 0)

core_req_pma_i.uncacheable

Requester

Indicates whether the access needs to be cached (unset) or not (set). Uncacheable accesses are directly forwarded to the memory. This signal must be delayed of 1 cycle after (core_req_valid_i & core_req_ready_o) = 1. It is only valid when using virtual indexing (core_req_i.phys_indexed = 0)

core_req_pma_i.io

Requester

Indicates whether the access targets input/output (IO) peripherals (set) or not (unset). IO accesses are directly forwarded to the memory. This signal must be delayed of 1 cycle after (core_req_valid_i & core_req_ready_o) = 1. It is only valid when using virtual indexing (core_req_i.phys_indexed = 0)

core_req_pma_i.wr_policy_hint

Requester

Indicates whether the target cacheline shall be managed as write-back (write allocate) or write-through (write non-allocate). It is only valid when using virtual indexing (core_req_i.phys_indexed = 0)

Table 6 CRI Response Channel Signals

Signal

Source

Description

core_rsp_valid_o

Cache

Indicates that the HPDcache has a valid response for the corresponding requester

core_rsp_o.rdata

Cache

Response read data

core_rsp_o.sid

Cache

The identification tag for the requester. It corresponds to the sid transferred with the request

core_rsp_o.tid

Cache

The identification tag for the request. It corresponds to the tid transferred with the request

core_rsp_o.error

Cache

Indicates whether there was an error condition while processing the request

core_rsp_o.aborted

Cache

Indicates if the request issued in the previous cycle shall be aborted. It is only considered if the previous request used virtual indexing

Cache Memory Interfaces

This section describes the Cache-Memory Interface (CMI) between the HPDcache and the NoC/memory. It implements 5 different channels.

This interface is synchronous to the rising edge of the global clock clk_i.

All CMI interfaces implements the ready-valid protocol described in section Valid/Ready handshake process for the handshake between the HPDcache and the NoC/Memory.

The address (mem_req_addr), size (mem_req_size), write data (mem_req_w_data) and write byte-enable (mem_req_w_be) signals shall comply with the alignment constraints defined in section Address, data, and byte enable alignment.

CMI Signal Descriptions

  • Memory Read Interfaces

Table 7 CMI Read Request Channel Signals

Signal

Source

Description

mem_req_read_valid_o

Cache

Indicates that the channel is signaling a valid request

mem_req_read_ready_i

NoC

Indicates that the NoC is ready to accept a request

mem_req_read_o.mem_req_addr

Cache

Target physical address of the request. The address shall be aligned to the mem_req_read_o.mem_req_size field.

mem_req_read_o.mem_req_len

Cache

Indicates the number of transfers in a burst minus one

mem_req_read_o.mem_req_size

Cache

Indicate the size of the access. The size is encoded as the power-of-two of the number of bytes

mem_req_read_o.mem_req_id

Cache

The identification tag for the request. The HPDcache always use unique IDs on the memory interface (i.e. two or more in-flight requests cannot share the same ID).

mem_req_read_o.mem_req_command

Cache

Indicates the type of operation to be performed

mem_req_read_o.mem_req_atomic

Cache

In case of atomic operations, it indicates its type

mem_req_read_o.mem_req_cacheable

Cache

This is a hint for the cache hierarchy in the system. It indicates if the request can be allocated by the cache hierarchy. That is, data can be prefetched from memory or can be reused for multiple read transactions

Table 8 CMI Read Response Channel Signals

Signal

Source

Description

mem_resp_read_valid_i

NoC

Indicates that the channel is signaling a valid response

mem_resp_read_ready_o

Cache

Indicates that the cache is ready to accept a response

mem_resp_read_i.mem_resp_r_error

NoC

Indicates whether there was an error condition while processing the request

mem_resp_read_i.mem_resp_r_id

NoC

The identification tag for the request. It corresponds to the ID transferred with the request

mem_resp_read_i.mem_resp_r_data

NoC

Response read data. It shall be naturally aligned to the request address

mem_resp_read_i.mem_resp_r_last

NoC

Indicates the last transfer in a read response burst

  • Memory Write Interfaces

Table 9 CMI Write Request Channel Signals

Signal

Source

Description

mem_req_write_valid_o

Cache

Indicates that the channel is signaling a valid request

mem_req_write_ready_i

NoC

Indicates that the cache is ready to accept a response

mem_req_write_o.mem_req_addr

Cache

Target physical address of the request

mem_req_write_o.mem_req_len

Cache

Indicates the number of transfers in a burst minus one

mem_req_write_o.mem_req_size

Cache

Indicate the size of the access. The size is encoded as the power-of-two of the number of bytes

mem_req_write_o.mem_req_id

Cache

The identification tag for the request. The HPDcache always use unique IDs on the memory interface (i.e. two or more in-flight requests cannot share the same ID).

mem_req_write_o.mem_req_command

Cache

Indicates the type of operation to be performed

mem_req_write_o.mem_req_atomic

Cache

In case of atomic operations, it indicates its type

mem_req_write_o.mem_req_cacheable

Cache

This is a hint for the cache hierarchy in the system. It indicates if the write is bufferable by the cache hierarchy. This means that the write must be visible in a timely manner at the final destination. However, write responses can be obtained from an intermediate point

Table 10 CMI Write Data Channel Signals

Signal

Source

Description

mem_req_write_data_valid_o

Cache

Indicates that the channel is transferring a valid data

mem_req_write_data_ready_i

NoC

Indicates that the target is ready to accept the data

mem_req_write_data_o.mem_req_w_data

Cache

Request write data. It shall be naturally aligned to the request address

mem_req_write_data_o.mem_req_w_be

Cache

Request write byte-enable. It shall be naturally aligned to the request address

mem_req_write_data_o.mem_req_w_last

Cache

Indicates the last transfer in a write request burst

Table 11 CMI Write Response Channel Signals

Signal

Source

Description

mem_resp_write_valid_i

NoC

Indicates that the channel is transferring a valid write acknowledgement

mem_resp_write_ready_o

Cache

Indicates that the cache is ready to accept the acknowledgement

mem_resp_write_i.mem_resp_w_is_atomic

NoC

Indicates whether the atomic operation was successfully processed (atomically)

mem_resp_write_i.mem_resp_w_error

NoC

Indicates whether there was an error condition while processing the request

mem_resp_write_i.mem_resp_w_id

NoC

The identification tag for the request. It corresponds to the ID transferred with the request

Interfaces’ requirements

This section describes the basic protocol transaction requirements for the different interfaces in the HPDcache.

Valid/Ready handshake process

All interfaces in the HPDcache use a valid/ready handshake process to transfer a payload between the source and the destination. The payload contains the address, data and control information.

As a reminder, the 7 interfaces in the HPDcache are the following:

  1. CRI request interface

  2. CRI response interface

  3. CMI read request interface

  4. CMI read response interface

  5. CMI write request interface

  6. CMI write data request interface

  7. CMI write response interface

The source sets to 1 the valid signal to indicate when the payload is available. The destination sets to 1 the ready signal to indicate that it can accept that payload. Transfer occurs only when both the valid and ready signals are set to 1 on the next rising edge of the clock.

A source is not permitted to wait until ready is set to 1 before setting valid to 1.

A destination may or not wait for valid to set the ready to 1 (cases (a) and (d) in Table 12). In other words, a destination may set ready to 1 before an actual transfer is available.

When valid is set to 1, the source must keep it that way until the handshake occurs. This is, at the next rising edge when both valid and ready (from the destination) are set to 1. In other words, a source cannot retire a pending valid transfer (Case (b) in Table 12).

After an effective transfer (valid and ready set to 1), the source may keep valid set to 1 in the next cycle to signal a new transfer (with a new payload). In the same manner, the destination may keep ready set to 1 if it can accept a new transfer. This allows back-to-back transfers, with no idle cycles, between a source and a destination (Case (d) in Table 12).

All interfaces are synchronous to the rising edge of the same global clock (clk_i).

Table 12 valid/ready scenarios

(a)

(b)

_images/wave_ready_before_valid.svg _images/wave_valid_before_ready.svg

(c)

(d)

_images/wave_ready_when_valid.svg _images/wave_back_to_back.svg

CRI Response Interface

In the case of the CRI response interfaces, there is a particularity. For these interfaces, it is assumed that the ready signal is always set to 1. That is why the ready signal is not actually implemented on those interfaces. In other words, the requester unconditionally accepts any incoming response.

Address, data and byte enable alignment

Address alignment

The address transferred (addr) in all request interfaces (CRI and CMI) shall be byte-aligned to the value of the corresponding size signal in that interface.

Some examples are illustrated in Figure 2. In the first case, the size value is 2 (which corresponds to \(\scriptsize\mathsf{2^2=4}\) bytes). Thus, the address must be a multiple of 4; In the second case, size value is 3. Thus, the address must be a multiple of 8. Finally, in the third case, size value is 0. Thus, there is no constraint on the address alignment.

Data alignment

The data must be naturally aligned to the address (addr) and the maximum valid bytes of the transfer must be equal to \(\scriptsize\mathsf{2^{size}}\). This means that the first valid byte in the data signal must be at the indicated offset of the address. Here, the offset corresponds to the least significant bits of the address, that allow to indicate a byte within the data word. For example, if the data signal is 128 bits wide (16 bytes), then the offset corresponds to the first 4 bits of the addr signal.

Some examples are illustrated in Figure 2. As illustrated, within the data word, only bytes in the range from the indicated offset in the address, to that offset plus \(\scriptsize\mathsf{2^{size}}\) can contain valid data. Other bytes must be ignored by the destination.

Additionally, within the range described above, the be signal indicates which bytes within that range are actually valid. Bytes in the data signal where the be signals are set to 0, must be ignored by the destination.

Byte Enable (BE) alignment

The be signal must be naturally aligned to the address (addr) and the number of bits set in this signal must be less or equal to \(\scriptsize\mathsf{2^\text{size}}\). This means that the first valid bit in the be signal must be at the indicated offset of the address. The offset is the same as the one explained above in the “Data alignment” paragraph.

Some examples are illustrated in Figure 2. As illustrated, within the be word, only bits in the range from the indicated offset in the address, to that offset plus \(\scriptsize\mathsf{2^{size}}\) can be set. Other bits outside that range must be set to 0.

Address, Data and Byte Enable Alignment in Requests

Fig. 2 Address, Data and Byte Enable Alignment in Requests

Cache-Requesters Interface (CRI) Attributes

Physical or Virtual Indexing

The HPDcache allows the address and physical memory attributes (PMA) to be sent by the requesters in two different (but consecutive) cycles.

This is useful to allow the pipelining of the address translation mechanism (when the core has one). This is illustrated in Figure 3. Doing the translation and directly forwarding to the cache is usually too costly in terms of timing. Instead, the requesters can:

Cycle 0

During the first cycle, forward the least significant bits of the address (addr_offset), which usually do not need to be translated, along with the other fields of the request (operation, identifiers, etc). In the meanwhile the core can perform the translation of the address to compute the most significant bits (addr_tag)

Cycle 1

During the second cycle, forward the previously translated most significant bits of the address (addr_tag), and the corresponding PMAs. PMAs are sent during this second cycle because usually they depend on the target physical address. The requester can abort the request during this cycle as explained in the next section (Request Abortion).

Pipelining of the Virtual and Physical Part of the Address

Fig. 3 Pipelining of the Virtual and Physical Part of the Address

This kind of indexing is named Virtually-Indexed Physically-Tagged (VIPT).

The requester shall send the tag and PMAs the next cycle after the core_req_valid_i and core_req_ready_o signals were set to 1 and the core_req_i.phys_indexed signal was set to 0.The number of bits of the address offset (addr_offset) depends on the number of cache sets (\(\scriptsize\mathsf{CONF\_HPDCACHE\_SETS}\)) and the size of the cachelines (\(\scriptsize\mathsf{CONF\_HPDCACHE\_CL\_WIDTH/8}\)). The address offset represents the concatenation of these two fields of the address: the byte offset in the cacheline and the set index. Requests can be sent back-to-back with no idle cycle in-between.

If requesters do not need virtual indexing, they can send the full address in the first cycle by setting the core_req_i.phys_indexed bit to 1. The address offset and the tag shall be sent through the core_req_i.addr_offset and core_req_i.addr_tag, respectively. A given requester is free to alternate between virtual and physical indexing on different clock cycles. Different requesters can use different indexing schemes (virtual or physical).

Request Abortion

When using the virtual indexing, the requester can abort the request during the second cycle of the addressing pipeline. In that case, the requester needs to set the req_abort signal to 1.

When a request is aborted, and the core_req_i.need_rsp field was set to 1, the HPDcache respond to the corresponding requester with the bit core_rsp_o.aborted set to 1.

CRI Type of Operation

A requester indicates the required operation on the 5-bit, HPDCACHE_REQ_OP signal. The supported operation are detailed in Table 13.

Table 13 Requesters Operation Types

Mnemonic

Encoding

Type

HPDCACHE_REQ_LOAD

0b00000

Read operation

HPDCACHE_REQ_STORE

0b00001

Write operation

HPDCACHE_REQ_AMO_LR

0b00100

Atomic Load-reserved operation

HPDCACHE_REQ_AMO_SC

0b00101

Atomic Store-conditional operation

HPDCACHE_REQ_AMO_SWAP

0b00110

Atomic SWAP operation

HPDCACHE_REQ_AMO_ADD

0b00111

Atomic integer ADD operation

HPDCACHE_REQ_AMO_AND

0b01000

Atomic bitwise AND operation

HPDCACHE_REQ_AMO_OR

0b01001

Atomic bitwise OR operation

HPDCACHE_REQ_AMO_XOR

0b01010

Atomic bitwise XOR operation

HPDCACHE_REQ_AMO_MAX

0b01011

Atomic integer signed MAX operation

HPDCACHE_REQ_AMO_MAXU

0b01100

Atomic integer unsigned MAX operation

HPDCACHE_REQ_AMO_MIN

0b01101

Atomic integer signed MIN operation

HPDCACHE_REQ_AMO_MINU

0b01110

Atomic integer unsigned MIN operation

HPDCACHE_CMO_FENCE

0b10000

Memory write fence

HPDCACHE_CMO_PREFETCH

0b10001

Prefetch a cacheline given its address

HPDCACHE_CMO_INVAL_NLINE

0b10010

Invalidate a cacheline given its address

HPDCACHE_CMO_INVAL_ALL

0b10011

Invalidate all Cachelines

HPDCACHE_CMO_FLUSH_NLINE

0b10100

Flush a cacheline given its Address

HPDCACHE_CMO_FLUSH_ALL

0b10101

Flush All Cachelines

HPDCACHE_CMO_FLUSH_INVAL_NLINE

0b10110

Flush and invalidate a cacheline given its address

HPDCACHE_CMO_FLUSH_INVAL_ALL

0b10111

Flush and invalidate all cachelines

Load and store operations are normal read and write operations from/to the specified address.

Atomic operations are the ones specified in the Atomic (A) extension of the [RISCVUP2019]. More details on how the HPDcache implements AMOs are found in section Atomic Memory Operations (AMOs).

CMOs are explained in Cache Management Operations (CMOs).

Source identifier

Each request identifies its source through the core_req_i.sid signal. The core_req_i.sid signal shall be decoded when the core_req_valid_i signal is set to 1. The width of this signal is \(\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_SRC\_ID\_WIDTH}\) bits. The HPDcache reflects the value of the sid of the request into the corresponding sid of the response.

Each port must have a unique ID that corresponds to its number. Each port is numbered from 0 to N-1. This number shall be constant for a given port (requester). The HPDcache uses this information to route responses to the correct requester.

Transaction identifier

Each request identifies transactions through the core_req_i.tid signal. The core_req_i.tid signal shall be decoded when the core_req_valid_i signal is set to 1. The width of this signal is \(\scriptsize\mathsf{CONF\_HPDCACHE\_REQ\_TRANS\_ID\_WIDTH}\) bits.

This signal can contain any value from 0 to \(\scriptsize\mathsf{2^{CONF\_HPDCACHE\_REQ\_TRANS\_ID\_WIDTH} - 1}\). The HPDcache forwards the value of the tid of the request into the tid of the corresponding response.

A requester can issue multiple transactions without waiting for earlier transactions to complete. Because the HPDcache can respond to these transactions in a different order than the one of requests, the requester can use the tid to match the responses with respect to requests.

The ID of transactions is not necessarily unique. A requester may reuse a given transaction ID for different transactions. That is, even when some of these transactions are not yet completed. However, when the requester starts multiple transactions with the same tid, it cannot match responses and requests because responses can be in a different order that the one of requests.

Cacheability

This cache considers that the memory space is segmented. A segment corresponds to an address range: a base address and an end address. Some segments are cacheable and others not. The HPDcache needs to know which segments are cacheable to determine if for a given read request, it needs to copy the read data into the cache.

The request interface implements an uncacheable bit (core_req_i.pma.uncacheable or core_req_pma_i.uncacheable). When this bit is set, the access is considered uncacheable. The core_req_i.pma.uncacheable signal shall be decoded when the core_req_valid_i signal is set to 1. The core_req_pma_i.uncacheable shall be decoded when the core_req_valid_i, core_req_ready_o and the core_req_i.phys_indexed signals were set to 1 the previous cycle.

Caution

For a given address, the uncacheable attribute must be consistent between accesses. The granularity is the cacheline. In the event that the same address is accessed with different values in the uncacheable attribute, the behavior of the cache for that address is unpredictable.

Need response

For any given request, a requester can set the bit core_req_i.need_rsp to 0 to indicate that it does not want a response for that request. The core_req_i.need_rsp signal shall be decoded when the core_req_valid_i signal is set to 1.

When core_req_i.need_rsp is set to 0, the HPDcache processes the request but it does not send an acknowledgement to the corresponding requester when the transaction is completed.

Write-Policy Hint

The CRI may set dynamically the write-policy (write-back or write-through) for the target cacheline. In the request interface, there are specific flags (hint) to indicate the desired policy for a given request.

The request interface drives the hint through the core_req_i.pma.wr_policy_hint or core_req_pma_i.wr_policy_hint signals. The core_req_i.pma.wr_policy_hint signal shall be decoded when the core_req_valid_i signal is set to 1. The core_req_pma_i.wr_policy_hint shall be decoded when the core_req_valid_i, core_req_ready_o and the core_req_i.phys_indexed signals were set to 1 the previous cycle.

The supported hints are detailed in Table 14.

Table 14 Requesters Write-Policy Hint

Mnemonic

Encoding

Type

HPDCACHE_WR_POLICY_AUTO

0b001

Request to to keep the current write-policy for the target cacheline if there is a copy in the cache, or use the default policy otherwise.

HPDCACHE_WR_POLICY_WB

0b010

Request a write-back (write allocate) policy for the target cacheline

HPDCACHE_WR_POLICY_WT

0b100

Request a write-through (write non-allocate) policy for the target cacheline

Error response

The response interface contains a single-bit core_rsp_o.error signal. This signal is set to 1 by the HPDcache when some error condition occurred during the processing of the corresponding request. The core_rsp_o.error signal shall be decoded when the core_rsp_valid_o signal is set to 1.

When the core_rsp_o.error signal is set to 1 in the response, the effect of the corresponding request is undefined. If this error signal is set in the case of LOAD or AMOs operations, the rdata signal does not contain any valid data.

Cache-Memory Interface (CMI) Attributes

CMI Type of operation

Table 15 Memory request operation types

Mnemonic

Encoding

Type

HPDCACHE_MEM_READ

0b00

Read operation

HPDCACHE_MEM_WRITE

0b01

Write operation

HPDCACHE_MEM_ATOMIC

0b10

Atomic operation

HPDCACHE_MEM_READ and HPDCACHE_MEM_WRITE are respectively normal read and write operations from/to the specified address.

In case of an atomic operation request (HPDCACHE_MEM_ATOMIC), the specific operation is specified in the MEM_REQ_ATOMIC signal. These operations are listed in Table 16. Note that these operations are compatible with the ones defined in the AMBA AXI prototol.

Table 16 Memory request atomic operation types

Mnemonic

Encoding

Type

HPDCACHE_MEM_ATOMIC_ADD

0b0000

Atomic fetch-and-add operation

HPDCACHE_MEM_ATOMIC_CLR

0b0001

Atomic fetch-and-clear operation

HPDCACHE_MEM_ATOMIC_SET

0b0010

Atomic fetch-and-set operation

HPDCACHE_MEM_ATOMIC_EOR

0b0011

Atomic fetch-and-exclusive-or operation

HPDCACHE_MEM_ATOMIC_SMAX

0b0100

Atomic fetch-and-maximum (signed) operation

HPDCACHE_MEM_ATOMIC_SMIN

0b0101

Atomic fetch-and-minimum (signed) operation

HPDCACHE_MEM_ATOMIC_UMAX

0b0110

Atomic fetch-and-maximum (unsigned) operation

HPDCACHE_MEM_ATOMIC_UMIN

0b0111

Atomic fetch-and-minimum (unsigned) operation

HPDCACHE_MEM_ATOMIC_SWAP

0b1000

Atomic swap operation

HPDCACHE_MEM_ATOMIC_LDEX

0b1100

Load-exclusive operation

HPDCACHE_MEM_ATOMIC_STEX

0b1101

Store-exclusive operation

Type of operation per CMI request channel

As a reminder, the HPDcache implements two request channels to the memory:

  1. Memory read request channel

  2. Memory write request channel

Table 17 indicates the type of operations that each of these two request channels can issue.

Table 17 Operation Types Supported by CMI Request Channels

Type

Channels

HPDCACHE_MEM_READ

  • CMI read request

HPDCACHE_MEM_WRITE

  • CMI write request

HPDCACHE_MEM_ATOMIC

  • CMI write request

Read-Modify-Write Atomic Operations

The following atomic operations behave as read-modify-write operations:

  • HPDCACHE_MEM_ATOMIC_ADD

  • HPDCACHE_MEM_ATOMIC_CLR

  • HPDCACHE_MEM_ATOMIC_SET

  • HPDCACHE_MEM_ATOMIC_EOR

  • HPDCACHE_MEM_ATOMIC_SMAX

  • HPDCACHE_MEM_ATOMIC_SMIN

  • HPDCACHE_MEM_ATOMIC_UMAX

  • HPDCACHE_MEM_ATOMIC_UMIN

  • HPDCACHE_MEM_ATOMIC_SWAP

These requests are forwarded to the memory through the CMI write request interface. A particularity of these requests is that they generate two responses from the memory:

  1. Old data value from memory is returned through the CMI read response interface.

  2. Write acknowledgement is returned through the CMI write response interface.

Both responses may arrive in any given order to the initiating HPDcache.

Regarding errors, if any response has its error signal set to 1 (mem_resp_*_i.mem_resp_r_error or mem_resp_*_i.mem_resp_w_error), the HPDcache considers that the operation was not completed. It waits for both responses and it forwards an error response (core_rsp_o.error = 1) to the corresponding requester on the HPDcache requesters’ side.

Exclusive Load/Store Atomic Operations

Exclusive load and store operations are issued as normal load and store operations on the CMI read request interface and CMI write request interface, respectively.

Specific operation types are however used on these exclusive requests: HPDCACHE_MEM_ATOMIC_LDEX for loads; and HPDCACHE_MEM_ATOMIC_STEX for stores.

These requests behave similarly to normal load and store to the memory but provide some additional properties described in Atomic Memory Operations (AMOs).

In the case of the HPDCACHE_MEM_ATOMIC_STEX request, the write acknowledgement contains an additional information in the mem_resp_w_is_atomic signal. If this signal is set to 1, the exclusive store was “atomic”, hence the data was written in memory. If this signal is set to 0, the exclusive store was “non-atomic”. Hence the write operation was abandoned.

The HPDcache uses exclusive stores in case of SC operations from requesters. Depending on the mem_resp_w_is_atomic value, the HPDcache responds to the requester according to the rules explained in Atomic Memory Operations (AMOs). A “non-atomic” response is considered a SC Failure, and a “atomic” response is considered a SC Success.

CMI Transaction identifier

Each request identifies transactions through the mem_req_*_o.mem_req_id signals. The mem_req_*_o.mem_req_id signal shall be decoded when the mem_req_*_valid_o signal is set to 1. The width of these ID signals is \(\scriptsize\mathsf{CONF\_HPDCACHE\_MEM\_ID\_WIDTH}\) bits.

The target (memory or peripheral) shall respond to a request by setting the mem_resp*_i.mem_resp_*_id signal to the corresponding mem_req*_i.mem_req_id.

mem_req_*_o.mem_req_id signals can contain any value from 0 to \(\scriptsize\mathsf{2^CONF\_HPDCACHE\_MEM\_ID\_WIDTH - 1}\).

The HPDcache can issue multiple memory transactions without waiting for earlier transactions to complete. The HPDcache uses unique IDs for each request. Unique IDs means that two or more in-flight requests never share the same ID. In-flight requests are those that have been issued by the HPDcache but have not yet received their respective response.

The target (memory or peripheral) of the in-flight request may respond to CMI in-flight requests in any order.

  • Transaction IDs in the CMI read request channel

The HPDcache can have the following number of in-flight read miss transactions:

\(\scriptsize\mathsf{CONF\_HPDCACHE\_MSHR\_SETS{}\times{}CONF\_HPDCACHE\_MSHR\_WAYS}\)

Each in-flight transaction has a unique transaction ID. This ID is formatted as follows:

  • For cacheable requests:

(mshr_way << log2(HPDCACHE_MSHR_SETS)) | mshr_set

The ID is the concatenation of two indexes: the MSHR set and the MSHR way occupied by the corresponding request.

  • For uncacheable requests

The HPDcache can issue up to 1 in-flight, uncached, read transaction. Uncached transactions have a unique transaction ID with all bits set to 1.

  • Transaction IDs in the CMI wbuf write request channel

The HPDcache can have the following number of in-flight write transactions:

\(\scriptsize\mathsf{CONF\_HPDCACHE\_WBUF\_DIR\_ENTRIES}\)

Each in-flight transaction has a unique transaction ID. This ID is formatted as follows:

  • For cacheable requests:

The ID corresponds to the index of the entry in the write-buffer directory.

wbuf_dir_index

  • For uncacheable requests

    The HPDcache can issue up to 1 in-flight, uncached, write transaction. Uncached transactions have a unique transaction ID with all bits set to 1.

Event signals

In addition to the performance registers explained in Performance counters, the HPDcache provides a set of one-shot signals that indicate when a given event is detected. These signals are set to 1 for one cycle each time the corresponding event is detected. If the same event is detected N cycles in a row, the corresponding event signal will remain set to 1 for N cycles. Table 18 lists these event signals.

These event signals are output-only. They can be either left unconnected, if they are not used, or connected with the remainder of the system. The system can use those signals, for example, for counting those events externally or for triggering some specific actions.

Table 18 Event Signals in the HPDcache

Signal

Source

Description

evt_o.write_req

Cache

Write request accepted

evt_o.read_req

Cache

Read request accepted

evt_o.prefetch_req

Cache

Prefetch request accepted

evt_o.uncached_req

Cache

Uncached request accepted

evt_o.cmo_req

Cache

CMO request accepted

evt_o.accepted_req

Cache

One request accepted (any type)

evt_o.cache_write_miss

Cache

Write miss event

evt_o.cache_read_miss

Cache

Read miss event

evt_o.req_onhold

Cache

Request put on-hold in the RTAB

evt_o.req_onhold_mshr

Cache

Request put on-hold because of a MSHR conflict

evt_o.req_onhold_wbuf

Cache

Request put on-hold because of a WBUF conflict

evt_o.req_onhold_rollback

Cache

Request put on-hold (again) after a rollback

evt_o.stall

Cache

Cache stalls request event