The dsPIC® Central Processing Unit, or CPU, seamlessly integrates the best features of a 16-bit microcontroller (MCU) and digital signal processor (DSP). Single instruction thread execution simplifies application debug and ensures deterministic operation. The dsPIC architecture is a modified Harvard Bus Architecture. This means that the program and data memories are accessed by separate buses. However, there are mechanisms to store and access constant data from the program memory space. This enables more efficient use of the available on-chip memory for some applications. Some instructions, specifically the dual-operand DSP instructions, allow dual accesses from the data random access memory (RAM) during the same instruction cycle. This is of tremendous benefit for DSP applications such as signal filtering.
The dsPIC® provides a large number of addressing modes to ease code development and enhance C compiler efficiency. Most addressing modes operate orthogonally on a set of sixteen 16-bit general purpose registers, which means that all instructions support all addressing modes. Individually vectored interrupt sources may be programmed to one of seven priority levels. Fixed five cycle latency, from Interrupt Request to Interrupt Service Routine entry, provides fast, deterministic application operation. The interrupt stack is part of the on-chip RAM, and provides automatic bound checking to prevent underflow or overflow.
- Single CPU integrating MCU and DSP functions
- Modified Harvard architecture
- Supports single cycle, three operand instructions:
- A = B + C
- Sixteen 16-bit general purpose registers (WREG0:WREG15)
- Fast, deterministic interrupt response with multiple priorities and vectors
- Flexible software stack with overflow detection
- Dual address generators (AGU) allow simultaneous access to data memory in single cycle
dsPIC® DSC Overview
The dsPIC®'s CPU has extensive mathematical processing capability provided by the DSP engine, dual 40-bit accumulators, hardware support for division operations, barrel shifter, 17 x 17 multiplier, a large array of 16-bit working registers and a wide variety of data addressing modes. Direct Memory Access (DMA) enables overhead-free transfer of data between several peripherals and a dedicated DMA RAM. The following block diagram is divided into three different parts: program memory, data memory,and core; each of which will be explained in further detail throughout the class.
Simplified Block Diagram
Just like our 8-bit family, the 16-bit family is based on the Harvard Architecture. Ordinarily, the Harvard Architecture doesn't provide any mechanism for transferring data between these two spaces, but the PIC24 and dsPIC® devices have a feature known as Program Space Visibility (or PSV for short) that allows us to view a 32k segment of flash program memory through RAM addresses. By default, the compiler makes extensive use of this feature for reading constants stored in flash.
As we can see from the figure below, we have two data buses, X and Y. If we want to be able to access two locations at the same time, one will always come from data RAM (X memory) and then the second one will come from a subset of data RAM set aside for Y memory. This type of data RAM is called dual port. There is only one set of instructions where we use the X and Y data buses, the MAC class instructions.
Instruction Set Overview
The dsPIC® CPU consists of 84 different instructions. If we consider the various addressing modes possible, there are nearly 250 possible opcodes. Most of the instructions can be programmed in one instruction word. However, 4 instructions require an additional instruction word; these are instructions such as GOTO, that involve specifying a 24-bit program memory address.
Most instructions execute in one instruction cycle. The exceptions are: program flow changes, table instructions, double-word data moves, and DO instructions which execute in two cycles due to the pipeline. The divide instruction is an interruptible single-cycle iterative instruction that needs to be repeated 18 times.
The 24-bit instruction word enables three operand instructions. This allows two source operands to be operated on, with the result stored in a third location in a single cycle. This maximizes code efficiency in both assembly and C language. The instructions this class focuses on are the DSP instructions (highlighted in red).
|Ten Instruction Categories|
|5) Bit Manipulation|
Program Memory Map
The Program Memory map of the dsPIC® includes the reset vector, interrupt vector table, an alternate vector table, user Program Flash memory, Data EEPROM and configuration memory space. The Reset Vector is a 2 word GOTO instruction, and is therefore the only vector that occupies two words. All interrupt vector locations are filled with their respective interrupt service routine or ISR addresses, if an ISR is defined. There is an alternate vector table that can optionally be enabled by the user. It provides a complete set of all interrupt vectors, and is a useful aid during system debug.
The executable 24-bit wide Program FLASH memory starts at 0x200 in all dsPIC® devices and progresses linearly through the program address space. The 16-bit wide Data EEPROM block resides in the upper end of the user program memory space, and is accessible using table instructions or the Program Space Visibility. The Data EEPROM can be used for storing application constants and other parametric data such as look-up tables and sensor calibration constants.
Finally, the Program Memory map includes configuration memory space, which can only be accessed using the table instructions. The Device Configuration Registers, which are used to configure basic parameters of device operation such as system clock source, are located in this address space.
Data Memory Map
Data memory is composed of two main regions: Near and Far. Data stored in near space may be accessed using direct addressing modes while data in far space may only be accessed indirectly through pointers. Understanding this will allow us to arrange our variables in the most efficient way for our applications. Also, while you won't frequently have to refer to a memory location's specific address, it may also be helpful to know that RAM is byte addressable. The low and high bytes of each 16-bit location have their own address. The word address is always associated with the low byte (even numbered) address.
The core has two data spaces, X and Y. These data spaces can be considered either separate (for some DSP instructions), or as one unified linear address range (for MCU instructions). The data spaces are accessed using two Address Generation Units (AGUs) and separate data paths. This feature allows certain instructions to concurrently fetch two words from RAM, thereby enabling efficient execution of DSP algorithms. Remember that if you are working with one of the dsPIC® devices, you will have to consider the separate X and Y memory regions when using any of the DSP class instructions. For example, if you are computing the output of a digital filter, your input data could be in Y data memory and your coefficients in X data memory so that they can both be fetched simultaneously.
The data space includes 2 Kbytes of DMA RAM, which is primarily used for DMA data transfers, but may be used as general purpose RAM. Lastly, the upper 32 Kbytes of the data space memory map can optionally be mapped into program space at any 16K program word boundary defined by the 8-bit Program Space Visibility Page (PSVPAG) register. The program-to-data space mapping feature lets any instruction access program space as if it were data space.
The programmer’s model consists of 16 x 16-bit working registers (W0 through W15), Status Register (SR), 2 x 40-bit accumulators (ACCA and ACCB), Program Counter (PC), Program Space Visibility Page register (PSVPAG), Data Table Page register (TBLPAG), REPEAT and DO registers (DOSTART, DOEND, DCOUNT and RCOUNT).
The working registers can act as data, address or offset registers. All registers are memory mapped. W0 is the W register for all instructions that perform file register addressing. Some of these registers have a shadow register associated with them. The shadow register is used as a temporary holding register and can transfer its contents to or from its host register upon some event occurring in a single cycle. None of the shadow registers are accessible directly. When a byte operation is performed on a working register, only the Least Significant Byte of the target register is affected. However, a benefit of memory mapped working registers is that both the Least and Most Significant Bytes can be manipulated through byte-wide data memory space accesses.
W15 is the dedicated software Stack Pointer (SP). It is automatically modified by exception processing and subroutine calls and returns. However, W15 can be referenced by any instruction in the same manner as all other W registers. This simplifies the reading, writing and manipulation of the Stack Pointer (e.g., creating stack frames). W14 has been dedicated as a Stack Frame Pointer, as defined by the LNK and ULNK instructions. However, W14 can be referenced by any instruction in the same manner as all other W registers. The Stack Pointer always points to the first available free word and grows from lower addresses towards higher addresses. It pre-decrements for stack pops (reads) and post-increments for stack pushes (writes).
In the first figure we see that W14 and W15 are highlighted in red, this means that those are dedicated registers and cannot be used as general purpose registers. The rest of the registers are in black and those are only used while you are using that specific instruction, the rest of the time they are free.
Our status register is divided into two parts: MCU status and DSP status. The MCU status bits are the ones we are accustomed to seeing, MCU ALU Carry/Borrow bit (C), MCU ALU Zero bit (Z), MCU ALU Overflow bit (OV), MCU ALU Negative bit (N), REPEAT Loop Active bit (RA), and the CPU Interrupt Priority Level Status bits (IPL<2:0>).
The accumulators, as we can see in the figure below, are 40 bit registers that take the 32 bit result of multiplying two 16 bit numbers and provide us with an extra 8 'guard bits' for overflow. The second half of our status register is where we can find all the bits related to overflow and saturation. DC is the MCU ALU Half Carry/Borrow bit, DA is the Do Loop Active bit, SAB is the SA or SB Combined Accumulator Saturation Status bit, OAB is the OA or OB Combined Accumulator Overflow Status bit, SB is the Accumulator B Saturation Status bit, SA is the Accumulator A Saturation Status bit, OB is the Accumulator B Overflow Status bit, and lastly, OA is the Accumulator A Overflow Status bit.
PC, PSV, DO and REPEAT
The program counter is 23 bits; note that the least significant bit is zero because the program counter always increases by two. Next, we have the Data Table Page Address and the Program Space Visibility Page Address, 8 bits each. Since the address ranges for the data and program spaces are 16 and 24 bits, respectively, a method is needed to create a 23-bit or 24-bit program address from 16-bit data registers. The solution depends on the interface method to be used. For table operations, the Table Page register (TBLPAG) is used to define a 32K word region within the program space. For remapping operations, the Program Space Visibility register (PSVPAG) is used to define a 16K word page in the program space.
RCOUNT is the REPEAT Loop Counter, this is where you specify how many times you want the REPEAT loop to be performed. DCOUNT is the DO Loop Counter, DOSTART is the DO Loop Start Address and DOEND is the DO Loop End Address. Lastly, we have the CORCON which is the Core Control Register, where we configure the different options for the DSP engine.
Program Space Visibility Window
Any 32 Kbytes segment of (Flash) Program Memory may be mapped into Data Memory (RAM). Once mapped, it is read as if it were truly there. This mode of operation is called Program Space Visibility (PSV) and provides transparent access of stored constant data from Program Memory space without the need to use special instructions (i.e., TBLRD , TBLWT instructions).
Why multiple address buses?
Equation 1 represents a typical DSP type operation where a number of operations need to be performed in a loop within a single cycle.(1)
- In a single cycle, a typical DSP instruction requires:
- One program memory fetch
- Two data memory reads h[k], x[n-k]
- One data memory write y(n)
- X and Y address generator units (AGUs) allow
- Two simultaneous reads from data memory
- PSV allows
- 32 KB of program memory to be mapped into X data memory so reads can be from plentiful program memory but treated like scarce data memory
- Ideal for look-up tables such as digital filter coefficients h[k]