Overview

This HPDcache is responsible for serving data accesses issued by a RISC-V core, tightly-coupled accelerators and hardware memory prefetchers. All these “clients” are called requesters.

The HPDcache implements a hardware pipeline capable of serving one request per cycle. An arbiter in the requesters’ interface of the HPDcache guarantees the correct behavior when there are multiple requesters. This is illustrated in Figure 1.

Fig. 1 High-Level View of the HPDcache Sub-System

List of features

Support for multiple outstanding requests per requester.
Support for multiple outstanding read misses and writes to memory.
Processes one request per cycle.
Any given requester can access 1 to 32 bytes of a cacheline per cycle.
Reduced energy consumption by limiting the number of RAMs consulted per request.
Fixed priority arbiter between requesters: the requester port with the lowest index has the highest priority.
Non-allocate, write-through policy or allocate, write-back policy. Either one or both are supported simultaneously at cacheline granularity.
Hardware write-buffer to mask the latency of write acknowledgements from the memory system.
Compliance with RVWMO.
For address-overlapping transactions, the cache guarantees that these are committed in the order in which they are consumed from the requesters.
For non-address-overlapping transactions, the cache may execute them in an out-of-order fashion to improve performance.
Support for CMOs: cache invalidation operations, and memory fences for multi-core synchronisation. Cache invalidation operations support the ones defined in the RISC-V CMO Standard.
Memory-mapped CSRs for runtime configuration of the cache, status and performance monitoring.
Ready-Valid, 5 channels (3 request/2 response), interface to the memory. This interface, cache memory interface (CMI), can be easily adapted to mainstream NoC interfaces like AMBA AXI [AXI2020].
An adapter for interfacing with AXI5 is provided.
External (optional), configurable, hardware, memory-prefetcher that supports up to 4 simultaneous prefetching streams.