Maintaining Cache Coherency

Last modified by Microchip on 2023/11/10 11:08

The MPLAB® XC32 C compiler’s runtime start-up code enables the cache and configures it for the highest performance (write-back with write allocation). This requires you to manage any potential cache coherency issues.

Any time two resources on a device depend on the same block of memory, you have to control access to that memory to ensure it isn’t simultaneously being changed by one and used by the other. A memory “ownership” flag can be created and used for this purpose.

In addition to controlling access to shared memory, you are also responsible for maintaining cache coherency.

Methods to Maintain Cache Coherency

Completely Disable Cache

You probably don’t want to do this because of lower performance (~ 10x) and higher power consumption. That said, this is how you do it:

Completely disable cache

The MPLAB XC32 C compiler provides an easy-to-use cache configuration file, pic32_init_cache.S. If you copy this file into your local project directory, it will override the default runtime setup code used to initialize the cache.

Change the __PIC32_CACHE_MODE from _CACHE_WRITEBACK_WRITEALLOCATE to _CACHE_DISABLE.

You can find this file in the following XC32 install directory, …xc32/vx.xx/pic32-libs/libpic32/stubs.

File: pic32_init_cache.S

/* Cache Coherency Attributes */
#define _CACHE_WRITEBACK_WRITEALLOCATE      3
#define _CACHE_WRITETHROUGH_WRITEALLOCATE   1
#define _CACHE_WRITETHROUGH_NOWRITEALLOCATE 0
#define _CACHE_DISABLE                      2

/* Set __PIC32_CACHE_MODE to the desired coherency attribute */
//#define __PIC32_CACHE_MODE      _CACHE_WRITEBACK_WRITEALLOCATE    //default runtime setup policy
#define __PIC32_CACHE_MODE      _CACHE_DISABLE

Note: __PIC32_CACHE_MODE is used to define the KSEG0 cache coherency algorithm bits (K0<2:0>) found in the PIC32MZ’s CONFIG register. Please refer to the device datasheet for details. 

Back to top

Enable Cache but Disable Ability to Cache Shared Data

The default cache policy is write-back with write allocation. This policy is the easiest for the hardware to implement, and consumes the least system bus resources and power. It is also the least useful for keeping shared (CPU and DMA) data coherent. Combining this cache policy with using uncached memory for shared data is the simplest cache management approach.

 Use KSEG1for shared memory

You can force specific variables or buffers to be uncached by allocating shared data to the uncached memory segment (KSEG1).

The core uses virtual addresses to access main memory. The virtual KSEG memory segments share the same physical addresses. DMA always uses physical addresses.

Visit the "Virtual vs Physical Memory" page for more details.

Back to top

Static Variables

Static variables can be created using the coherent attribute. This assigns a variable or array to the un-cached KSEG1 memory segment.

unsigned int __attribute__((coherent)) buffer[1024];

The coherent variable attribute causes the compiler/linker to place the variable into a unique section that is allocated to the KSEG1 region, rather than the KSEG0 region (which is the default on L1 cached devices). This means that the variable is accessed through the uncached address. 

Back to top

Automatic Variables

The stack is implemented in cachable memory (KSEG0) by default. If you don’t want to cache automatic (local) variables, use the _ _pic32_alloc_coherent() and _ _pic32_free_coherent() functions as shown in this example.

#include <xc.h>
void myFunction(void)
{
   char* buffer = __pic32_alloc_coherent(1024);
   if (buffer)
    {
       /* do something */
    }
   else
    {
       /* handle error */
    }
   if (buffer)
    {
        __pic32_free_coherent(buffer);
    }
}

The __pic32_alloc_coherent(size_t) and __pic32_free_coherent(void*) functions are XC32 C compiler utility functions that can be used to allocate and free memory from the uncached kseg1_data_mem region. You can use these functions to allocate an uncached buffer for local variables shared with DMA. These functions call the standard malloc()/free() functions, but the pointers that they use are translated from kseg0 to kseg1. 

Back to top

Enable Cache and Manage Cache Coherency for Shared Data

Manage Stale Cache

If DMA changes main memory, there is no hardware mechanism to automatically update associated cache. Your software will have to ensure the cache is coherent (is not stale) when the core reads data from it. You can force the cache to reload specific lines (CACHE and PREF instructions), or force the core to read directly from main memory (KSEG1).

The following code example demonstrates how to read a variable defined in cached memory (KSEG0) from uncached memory (KSEG1). This uses the KVA0_TO_KVA1 macro (Kernel Virtual Address Segment 0 to Segment 1 XC32 C compiler macro) to translate a variable's cached address to its uncached address.

// Declare a variable. By default it is defined in KSEG0
   int Var1InKseg0 = 5;

// Declare a pointer and use the KVA0_TO_KVA1 macro to assign it the uncached address of Var1InKesg0
   int *pVar1InKseg1 = KVA0_TO_KVA1(&Var1InKseg0);

    y = *pVar1InKseg1;    // Assign y the uncached value of Var1InKseg0

Back to top

Manage Dirty Cache

Hardware will automatically update main memory if the core writes to cache. You can configure the hardware with one of the following options:

Back to top

Write-back With Write Allocation (Default Configuration)

If the core writes to cache, the hardware will allow cache to remain dirty until data is evicted from the cache. The hardware keeps track of the dirty cache lines, and will "write-back" these lines to main memory when or if the data is evicted from the cache.

When the core reads from cached memory (KSEG0), it will first search cache, reading main memory only if the desired data does not reside in cache. The hardware automatically fills cache on a cache miss. This is called read allocation.

The core also has the ability to allocate cache lines for writes to cached memory even if the data isn't already in cache. This is called "write allocation".

This policy is the easiest for the hardware to implement, and consumes the least system bus resources and power. It is also the least useful for keeping shared data coherent. Combining this cache policy with using uncached (KSEG1) memory for shared data is the simplest cache management approach, and is recommended for getting your project up and running.

Write-back with write allocation

Back to top

Write-through With Write Allocation

If the core writes to cache, the hardware will also automatically update main memory. This "write-through" configuration ensures data in the cache is never dirty.

When the core reads from cached memory (KSEG0), it will first search cache, reading main memory only if the desired data does not reside in cache. The hardware automatically fills cache on a cache miss.

The core also has the ability to allocate cache lines for writes to cached memory even if the data isn't already in cache. This is called "write allocation".

This policy ensures shared data is never dirty, but you will still need to ensure shared data is not stale.

Write-through with write allocation

Back to top

Write-through With No Write Allocation

If the core writes to cache, the hardware will also automatically update main memory. This "write-through" configuration ensures data in the cache is never dirty.

When the core reads from cached memory (KSEG0), it will first search cache, reading main memory only if the desired data does not reside in cache. The hardware automatically fills cache on a cache miss.

The core does not have the ability to write to cache unless the data to be modified is already in cache. This is called "no write allocation".

  • If the core writes to cacheable memory, and the data is not in the cache, it will write to main memory only.
  • If the core writes to cacheable memory, and the data is in the cache, it will write to cache and also update main memory.

This policy ensures shared data is never dirty, but you will still need to ensure shared data is not stale.

Write-through with no write allocation

Back to top

Cache Policy Recommendations

The recommended approach is to start with a writeback write-allocate (default) cache policy and use KSEG1 when accessing any memory used by a DMA peripheral. This is the simplest approach and in most cases it will provide acceptable performance.

Once the project is running and debugged, performance can be improved by changing the access of DMA memory to KSEG0 and employing the CACHE and PREF instructions to manage coherency. In systems employing multiple DMA bus masters, software management of the cache can be used only where necessary and implemented on one DMA peripheral at a time simplifying the debug process.

The following table summarizes some of the pros and cons for each cache policy. The default cache policy for the PIC32MZ family, as is present in the start-up code supplied in the development tools, is write-back with write allocation.

PolicyProsCons
UncachedNo cache coherency issues.Greatly impaired performance since every memory access must account for bus transfer time and memory wait states.
Write-back with Write AllocationBest performance is achieved with this policy. All transactions are done using the cache with memory accesses performed only when needed.Application must address coherency on both reads and writes to memory.
Write-through with no Write AllocationCache coherency issues are eliminated for writes as memory is always updated.Results in the CPU taking a larger percentage of the memory bus bandwidth since every CPU write results in a bus transaction. Even back to back writes are written to memory. Cache coherency for CPU reads must still be addressed.
Write-through with Write AllocationCache coherency issues are eliminated for writes as memory is always updated. Writes to memory also fill cache so the data written is immediately available for a CPU read.Results in the CPU taking a larger percentage of the memory bus bandwidth since every CPU write results in a bus transaction. Even back to back writes are written to memory. Cache coherency for CPU reads must still be addressed. Writes to memory also fill the cache which can result in needed data being evicted from cache.

Back to top

Changing Cache Policy

Changing cache policy

The MPLAB® XC32 C compiler provides an easy to use cache configuration file, pic32_init_cache.S. If you copy this file into your local project directory, it will override the default runtime setup code used to initialize the cache. The default runtime setup code initializes the cache policy to write-back with write allocation.

You can find this file in the following XC32 install directory, …xc32/vx.xx/pic32-libs/libpic32/stubs.

File: pic32_init_cache

/* Cache Coherency Attributes */
#define _CACHE_WRITEBACK_WRITEALLOCATE      3
#define _CACHE_WRITETHROUGH_WRITEALLOCATE   1
#define _CACHE_WRITETHROUGH_NOWRITEALLOCATE 0
#define _CACHE_DISABLE                      2

/* Set __PIC32_CACHE_MODE to the desired coherency attribute */
#define __PIC32_CACHE_MODE      _CACHE_WRITEBACK_WRITEALLOCATE    //default runtime setup policy

Note: __PIC32_CACHE_MODE is used to define the KSEG0 cache coherency algorithm bits (K0<2:0>) found in the PIC32MZ’s CONFIG register. Please refer to the device data sheet for details.

Back to top

Changing Cache Policy During Code Development and Debug

Microchip recommends the cache policy be configured by the run time start-up code, and should not be changed on the fly. For code development and debug, you may want to use the example code shown below to see how a given cache policy affects your code's performance. It should be placed at the top of the main() function.

Ensure that you exercise caution when changing the cache policy of KESG0 using the CP0 Configuration register. If disabling the cache, the existing entries are effectively invalidated and their contents are lost. If enabling the cache, it could contain stale or erroneous data and require initialization. Initialization of the cache and configuration for KSEG0 is assigned in the start-up code.

/* Cache Coherency Configuration Options (KSEG0) */
#define UNCACHED 0x02   // uncached
#define WB_WA 0x03      // write-back, write allocate
#define WT_WA 0x01      // write-through, write allocate
#define WT_NWA 0x00     // write-through, no write allocate

void set_cache_policy(int cc)
{
   unsigned int cp0;
    cp0 = _mfc0(16, 0); // read the cp0 config register
   cp0 &= ~0x03;       // clear the K0 field
   cp0 |= cc;          // update K0 with the new value
   _mtc0(16, 0, cp0);  // write the cp0 config register
}

int main(void)
{

    set_cache_policy(UNCACHED);    //    set_cache_policy(UNCACHED);
   .....
    .....
}

Note: The K0 field in the code above refers to the cache coherency algorithm bits (K0<2:0>) found in the PIC32MZ’s CP0 CONFIG register. Please refer to the device data sheet for details. 

Back to top