HomeSTM32STM32 UART SeriesSTM32 UART PART 4 – Receive Data using DMA

STM32 UART DMA Receive Example – Normal and Circular Mode Explained

When working with STM32 microcontrollers, UART is one of the most common ways to send and receive data. However, using simple blocking or interrupt methods often puts extra load on the CPU, especially when large or continuous data is involved. This is where UART DMA receive becomes very useful. By combining UART with DMA in STM32, you can receive data efficiently in the background, while the CPU is free to handle other tasks.

In this tutorial, we will learn how to set up STM32 UART DMA receive using STM32CubeMX and HAL. We will explore both Normal mode and Circular mode, understand the difference between them, and see how to manage buffers and callbacks properly. With clear examples and step-by-step explanation, you will understand when to use each mode and how to apply it in real projects.

This is the 4th tutorial in the STM32 UART Peripheral Series. Today we will use the DMA in normal mode and also in the circular mode. I will demonstrate some scenarios under which we will use these different modes to receive a large data via the UART.

Recommended Resources:

You must check out the previous tutorials to understand the data reception via the UART.

STM32 UART DMA Receive Video Tutorial

This guide explains how to configure and use UART DMA receive in STM32, including both Normal and Circular modes. You’ll learn the CubeMX setup, HAL functions, and complete example code to handle incoming data efficiently. To make things even clearer, I’ve also created a step-by-step video walkthrough that shows the configuration, coding, and live testing of UART DMA reception. Follow the written tutorial while watching the video to understand each step better and avoid common mistakes.

Watch the Video

Hardware Required for STM32 UART Example

Before we begin with the project, let’s look at the hardware and software needed for this STM32 UART tutorial. Below are the affiliate links you can consider before purchasing the items.

What is UART in STM32?

UART (Universal Asynchronous Receiver/Transmitter) is one of the most common communication peripherals in STM32 microcontrollers. It provides a simple, reliable way to send and receive data between the STM32 and other devices, such as sensors, modules, or another microcontroller. Unlike protocols that need a clock signal, UART works asynchronously, which makes it easy to use with many external devices.

In STM32, the UART peripheral is supported by the HAL library and can be configured easily in STM32CubeMX. Developers often combine UART with DMA to improve efficiency, especially when handling continuous or large amounts of data.

Basics of UART communication

UART communication works on the principle of transmitting data in the form of bits, organized into frames. Each frame usually contains a start bit, the actual data bits, an optional parity bit for error checking, and one or more stop bits to mark the end.

UART frame diagram showing start bit, 8 data bits, parity bit, and stop bit in a waveform format for STM32 UART communication.

Since UART is asynchronous, both devices must agree on the same baud rate (data transfer speed). If the baud rate does not match, errors or data corruption can occur. This makes choosing the correct configuration very important.


UART vs USART

UART stands for Universal Asynchronous Receiver Transmitter whereas the USART stands for Universal Synchronous Asynchronous Receiver Transmitter. The term Synchronous enables the USART to send an additional clock signal to the receiving device. The data is then sampled at a predefined edge (Rising or Falling) of the clock. The USART mode uses 3 pins (clock, Tx and Rx) compared to the 2 pins (Tx and Rx) used in the UART.

The most widely used synchronous communications are I2C, SPI, etc. The USART, in a way, can act as the SPI master to send the data to the slave. The synchronous communication is also used when communicating with a smartcard.

Here is a comparison table between UART and USART based on the information you provided:

FeatureUARTUSART
Full FormUniversal Asynchronous Receiver TransmitterUniversal Synchronous/Asynchronous Receiver Transmitter
Communication TypeAsynchronous onlySupports both Synchronous and Asynchronous
Clock SignalNot usedUses an additional clock line in synchronous mode
Number of Pins Used2 (TX and RX)3 in synchronous mode (TX, RX, and CLK)
Data SamplingBased on start and stop bits (internal clock)Data is sampled on the rising or falling edge of external clock
Application ExampleBasic serial communication with PCs, Bluetooth modulesSmartcard communication, SPI-like data transfers
Acts Like SPI Master? No Yes, in synchronous mode
Use of Start/Stop BitsRequired for data framingOptional in synchronous mode

We will use the UART for the major part of this series as it has wider applications compared to the USART.


Key UART configuration parameters (baud rate, parity, stop bits, word length)

When setting up UART in STM32, several parameters must be configured correctly:

  • Baud rate – Defines how fast data is transmitted. Common values are 9600, 115200, or higher for faster communication. Both devices must use the same baud rate.
  • Parity – Provides simple error detection. Options are None, Even, or Odd parity. Many applications disable parity to simplify communication.
  • Stop bits – Mark the end of a frame. You can choose 1 stop bit, 1.5, or 2 depending on reliability needs.
  • Word length (data bits) – Defines how many data bits are transmitted in each frame. Common choices are 8 bits or 9 bits.

Choosing the right combination of these parameters ensures smooth and error-free data transfer between STM32 and connected devices.

Why Use DMA with UART in STM32?

When receiving data through UART in STM32, you can use different methods like blocking mode, interrupt mode, or DMA mode. While blocking and interrupts work fine for small or occasional data, they quickly become inefficient when handling continuous streams or large amounts of data. This is where UART with DMA becomes a better choice.

Limitations of blocking and interrupt modes

  • Blocking mode: In this method, the CPU waits until the data transfer is complete. This wastes processing time because the CPU cannot perform other tasks while waiting. For large data or continuous reception, blocking mode severely reduces system performance.
  • Interrupt mode: Interrupts improve efficiency by notifying the CPU only when data is available. However, if the incoming data is very frequent or in large bursts, the CPU must handle too many interrupts. This can cause overhead, reduce responsiveness, and even lead to missed data if the system cannot keep up.

Both of these methods are not ideal when you need high-speed data reception or continuous communication.


Advantages of DMA (reduced CPU load, continuous streaming, faster throughput)

Using DMA (Direct Memory Access) with UART solves these problems:

  • Reduced CPU load: DMA transfers data directly between UART and memory without constant CPU involvement. The CPU is free to handle other tasks.
  • Continuous streaming: With DMA in Circular mode, you can continuously receive data into a buffer without worrying about missing bytes. This is perfect for applications like sensor logging, GPS data, or wireless communication.
  • Faster throughput: Since DMA works independently of the CPU, it can handle data transfers more efficiently and at higher speeds. This ensures reliable communication even in demanding applications.

UART DMA receive in STM32 makes communication more efficient, reduces processing overhead, and enables smooth handling of real-time data streams.


Blocking vs Interrupt vs DMA in STM32 UART

FeatureBlocking ModeInterrupt ModeDMA Mode
CPU UsageVery high (CPU waits until data transfer finishes)Medium (CPU wakes up frequently for each interrupt)Very low (DMA handles transfers independently)
EfficiencyPoor for large or continuous dataBetter than blocking but still limited at high data ratesExcellent, suitable for continuous or high-speed data
ThroughputLimited by CPU speedHigher than blocking, but may drop with heavy loadVery high, as DMA can transfer data in the background
ComplexitySimple to implementModerate, requires ISR handlingSlightly more complex setup, but handled easily with HAL functions
Best Use CaseSmall, occasional dataMedium data rates, when CPU load is lightContinuous streaming, large data transfers, high-speed communication

UART DMA in Normal mode

Let’s assume a case where we want to receive a large amount of data, and our MCU has enough RAM to store that data into a buffer. We can use the DMA in NORMAL mode to receive this data over the UART and then store the data into the buffer.

A situation like this can work for few kilobytes of data as most of the STM32 MCUs has RAM in few kilobytes. But if you want to store an audio file or a video file, then you can’t afford to use a single buffer.

When to Use It ?

DMA Normal mode is perfect when:

  • The incoming data has a known size
  • You’re expecting the full buffer to fill once
  • You can afford to pause data reception while processing

It’s simple and fast but not ideal for continuous streaming. I will demonstrate how to use a 4KB buffer with HAL_UART_Receive_DMA() and switch between size and data phase using HAL_UART_RxCpltCallback().

STM32CubeMX Configuration for UART DMA

The image below shows the cubeMX configuration to enable the UART DMA in Normal mode.

STM32 UART DMA Normal Mode Configuration

The DMA request is set for USART2_RX as we are receiving the data via the DMA. The data width is Byte as the UART transfers the data in bytes. The DMA mode is set as Normal.

The rest of the UART configuration is same as the previous tutorials with Baud Rate of 115200 with 8 data bits, 1 stop bit and no parity.


STM32 UART DMA Receive Code (with Callbacks and Buffer Management)

We need to know the size of the incoming data. So the sender should first send 4 bytes of the size data followed by the data itself. If you are receiving larger data, you can change the length of the size data bytes.

In the main function, we will set the DMA to receive 4 data bytes for the size.

uint8_t RxData[4096]
int main()
{
  ....
  HAL_UART_Receive_DMA(&huart2, RxData, 4);
  while (1)
  {
    HAL_GPIO_TogglePin(GPIOA, GPIO_PIN_5);
    HAL_Delay(1000);
  }
}

The function HAL_UART_Receive_DMA will receive 4 data bytes. Once all the 4 bytes has been received, the interrupt will trigger and the UART Receive Complete Callback will be called.

int isSizeRxed = 0;
uint16_t size = 0;

void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
	if (isSizeRxed == 0)
	{
		size = ((RxData[0]-48)*1000)+((RxData[1]-48)*100)+((RxData[2]-48)*10)+((RxData[3]-48));
		isSizeRxed = 1;
		HAL_UART_Receive_DMA(&huart2, RxData, size);
	}
  else if (isSizeRxed == 1)
	{
		isSizeRxed = 0;
		HAL_UART_Receive_DMA(&huart2, RxData, 4);
	}
}

The callback will be first called when the 4 size bytes are received. The isSizeRxed variable was 0 in the beginning, therefore, we will calculate the size using the 4 bytes of the RxData buffer.

The size bytes are transferred in the Ascii form and hence we need to subtract 48 to convert them to integer equivalent.

After calculating the size, we will set the variable isSizeRxed to 1 so that we don’t enter this loop again. Then we will call the function HAL_UART_Receive_DMA to receive the required number of data bytes as calculated by the size variable.

Once all the required number of data bytes has been received, we will enter this function again. This time the variable isSizeRxed is set to 1, so the else condition will execute. Here we will reset the variable isSizeRxed to 0 and receive the 4 size bytes. This will make this entire loop to run forever.

Now we can receive the large data of any size, but it should be less than 4KB. This is because we have defined a buffer of 4KB to store the data. Although we need to send the size first, followed by the data itself.


STM32 UART DMA Result

The images below shows the data sent by the serial console and the data stored in the RxData buffer in the CubeIDE debugger.

Data sent via UART DMA is stored in the RxBuffer

The numbers marked on the image are explained below.

  1. I am going to send a file of size 2176 bytes, so I need to first send the size.
  2. Send the size data (“2176”).
  3. Select the file that contains the data.
  4. Send the file.
  5. The MCU has extracted the size data, and it is expecting 2176 bytes to be received.

The RxData buffer has the data. To make sure we received the entire data, we will cross check the start and end part with the actual data. The images below shows the comparison between the actual data, and the data stored in the RxData buffer.

UART File that I sent to STM32
UART RX Buffer received the entire data

You can see the actual data in the file and the data stored in the RxData buffer have the same content in the beginning and in the end. This means we have received entire data from the file.

UART DMA in Circular Mode

Let’s assume another case where we want to receive an audio or a video file from the UART and then store it in the SD card or a flash memory connected to the MCU. These types of files can be of few megabytes in size, so we can’t store them in a buffer. Instead we can receive a portion of the file and write it to SD card, then receive another portion and write it. This way we can transfer the entire file to the SD card without even storing it to the buffer in the MCU Ram.

Although I don’t want to involve the SD card related functions in this tutorial, so I will just use a buffer to store the data. If you are using an actual SD card or flash storage, you can use the same code, just instead of writing to buffer, write the data to the SD card. The process remains the same, so there aren’t many changes from the writing prospective.

STM32CubeMX Configuration for UART DMA

The image below shows the cubeMX configuration for the UART DMA in circular mode.

STM32 UART DMA Circular Mode Configuration

The DMA request is set for USART2_RX as we are receiving the data via the DMA. The data width is Byte as the UART transfers the data in bytes. The DMA mode is set as Circular.

The rest of the UART configuration is same as the previous tutorials with Baud Rate of 115200 with 8 data bits, 1 stop bit and no parity.

In Circular mode, the DMA never stops automatically, it is always in the receiving mode. Once all the required number of data bytes has been received, it automatically reset the receive counter to 0 and hence starts receiving again.

We still need to know the size of the incoming data. So the sender should first send 4 bytes of the size data followed by the data itself. If you are receiving larger data, you can change the length of the size data bytes, the rest of the code will change accordingly.


STM32 UART DMA Receive Code (with Callbacks and Buffer Management)

In the main function, we will set the DMA to receive 256 bytes of data. This data will contain the size bytes as well as the actual data itself.

uint8_t RxData[256];
uint8_t FinalBuf[4096];
int main()
{
  ....
  HAL_UART_Receive_DMA(&huart2, RxData, 256);
  ....
}

The 256 bytes we requested contains the size data as well as the actual data. Once 128 bytes are received, the half received complete callback will be called. We can handle the received data inside this callback, while the DMA continues to receive the second half. Once all the 256 bytes are received, the receive complete callback will be called. Here we will process the data received in the second half of the buffer, while the DMA continues to receive the 3rd half. This process keep going on until the sender stops sending the data.

int HTC = 0, FTC = 0;
uint32_t indx=0;

int isSizeRxed = 0;
uint32_t size=0;

void HAL_UART_RxHalfCpltCallback(UART_HandleTypeDef *huart)
{
	if (isSizeRxed == 0)
	{
		size = ((RxData[0]-48)*1000)+((RxData[1]-48)*100)+((RxData[2]-48)*10)+((RxData[3]-48));  // extract the size
		indx = 0;
		memcpy(FinalBuf+indx, RxData+4, 124);  // copy the data into the main buffer/file
		memset(RxData, '\0', 128);  // clear the RxData buffer
		indx += 124;  // update the indx variable
		isSizeRxed = 1;  // set the variable to 1 so that this loop does not enter again
	}
	else
	{
		memcpy(FinalBuf+indx, RxData, 128);
		memset(RxData, '\0', 128);
		indx += 128;
	}
	HTC=1;  // half transfer complete callback was called
	FTC=0;
}

The size data bytes are sent first so they are received in the first half of the received data. We will extract the size data, and write the rest of the data in the buffer/file inside the half received complete callback.

Since we use 4 bytes for the size field, the remaining 124 bytes (128 – 4) will pass to the buffer or file. At the same time, we update the indx variable to keep track of how many data bytes have already been written to the buffer or file.

This callback is called several times during the transfer, depending on how large data is. We only need to extract the size data in the first call, and for the rest, we will simply copy the 128 bytes to the buffer/file.

Similarly, the receive complete callback is called whenever all 256 bytes are received.

void HAL_UART_RxCpltCallback(UART_HandleTypeDef *huart)
{
	  memcpy(FinalBuf+indx, RxData+128, 128);
	  memset(RxData+128, '\0', 128);
	  indx+=128;
	  HTC=0;
	  FTC=1;
}

Here we will simply copy the 128 bytes, from the second half of the buffer into the final buffer/file. Then clear the RxData buffer and update the indx variable.

Copying data using the half and complete callback is fine as long as the received data is in the multiple of 256. If not, then we have an issue. For example, if we receive 260 bytes, the half-receive callback will trigger first, followed by the receive-complete callback. However, the remaining 4 bytes will be stored at the beginning of the RxData buffer. Since these 4 bytes do not reach the 128-byte threshold, the half-receive callback will not trigger, and as a result, we might lose those 4 bytes.

To avoid this, we will manually check the received size with the size mentioned by the sender. If they are not equal, we will look where the remaining data is stored and then copy the data to our buffer/file.

Below is the code showing it in the while loop.

while (1)
{
  if (((size-indx)>0) && ((size-indx)<128))
  {
    if (HTC==1)
    {
      strcpy((char *)FinalBuf+indx, (char *)RxData+128);  // memcpy (FinalBuf+indx, RxData+128, (size-indx));
      indx = size;
      isSizeRxed = 0;
      HTC = 0;
      HAL_UART_DMAStop(&huart2);
      HAL_UART_Receive_DMA(&huart2, RxData, 256);
    }

  else if (FTC==1)
  {
     strcpy((char *)FinalBuf+indx, (char *)RxData);  // memcpy (FinalBuf+indx, RxData, (size-indx));
     indx = size;
     isSizeRxed = 0;
     FTC = 0;
     HAL_UART_DMAStop(&huart2);
     HAL_UART_Receive_DMA(&huart2, RxData, 256);
  }

We basically check if the difference between the size and indx variable is more than 0 and less than 128. This step is necessary because the size variable is calculated at the beginning, which means it starts with a large value. At the same time, the indx variable keeps increasing as more data bytes arrive. We choose the value 128 because if more than 128 bytes remain, either the half-complete or the complete callback will eventually trigger.

So if we do enter inside this condition, it means that neither of the callbacks are being called. The sender have stopped sending the data, and we have some extra data in either the first half or the second half of the RxData buffer.

We will verify which half contains the data by checking the HTC and FTC variables. If the HTC variable is set, it indicates that the half-receive callback has been triggered, which means the data is stored in the second half of the RxData buffer. In the same way, if the FTC variable is set, it indicates that the receive-complete callback has been triggered, and the data is stored in the first half of the RxData buffer.

We will simply copy the remaining data (size-indx) from the RxData buffer into the Final buffer/file. Then update the indx variable and reset the HTC/FTC variable. We also reset the isSizeRxed variable, so the system can correctly process the size of the new incoming data..

Now we need to start storing the received data from the beginning of the RxData buffer. But the DMA in circular mode will just store the data at the very next position. So we need to manually stop the DMA and call the function again to receive 256 bytes of data.

We discussed the case where extra bytes arrived and got stored in either the first half or the second half of the RxData buffer. But we could also receive data in the multiples of 128, so there will be no extra byte at all.

We also need to handle this scenario.

	  else if ((indx == size) && ((HTC==1)||(FTC==1)))
	  {
		  isSizeRxed = 0;
		  HTC = 0;
		  FTC = 0;
		  HAL_UART_DMAStop(&huart2);
		  HAL_UART_Receive_DMA(&huart2, RxData, 256);
	  }
}

Here we will check if the size variable is equal to the indx variable. This situation can also occur at the beginning when both values are 0, or right after the previous if loop runs. To handle this, we add one more check to the condition and verify if either the HTC or FTC variable is set. This will confirm that the indx variable and the size variable are equal only after receiving all the data.

Inside this condition, we don’t need to copy any data since all the data has already been handled. We will simply reset the variables and start the DMA again.

Now we can receive the large data of any size and store in the buffer/file. Although we need to send the size first, followed by the data itself.


STM32 UART DMA Circular Mode Result

The images below shows the data sent by the serial console and the data stored in the FinalBuf buffer.

STM32 UART DMA Receives data in circular Mode

The numbers marked on the image are explained below.

  1. I am going to send a file of size 2200 bytes, therefore, I need to send the size first.
  2. Send the size data (“2200”).
  3. Select the file that contains the data.
  4. Send the file.
  5. The MCU has extracted the size data, and it is expecting 2200 bytes to be received.
  6. The indx variable is 2200, which means that the MCU has received 2200 bytes.

Comparisons & Best Practices

When working with UART DMA in STM32, it’s important to know when to use Normal mode and when to use Circular mode. Choosing the right buffer size and handling data framing correctly ensures reliable communication.

Normal vs Circular: pros and cons

Normal Mode

  • ✔️ Easy to implement
  • ✔️ Good for fixed-length data packets
  • 🅧 Callback only after the entire buffer is filled
  • 🅧 Not suitable for continuous or unknown-length data

Circular Mode

  • ✔️ Perfect for continuous data reception (GPS, sensors, streaming)
  • ✔️ Buffer is reused automatically, no need to restart DMA
  • 🅧 More complex to handle (need half-complete and complete callbacks)
  • 🅧 Risk of overwriting data if buffer not processed in time

How to choose buffer sizes

  • Match the buffer size to the maximum expected data length.
  • For Normal mode, use a fixed buffer equal to the packet size.
  • For Circular mode, choose a larger buffer to store continuous streams without overflow.
  • Consider RAM limits—using very large buffers may waste memory on smaller STM32 devices.

Handling IDs, size fields, and framing

In many communication protocols, each packet has a structure:

  • ID field – identifies the type of message.
  • Size field – tells how many bytes follow.
  • Payload data – the actual information.
  • Checksum / CRC – optional error detection.

When using DMA receive, you must parse these fields correctly in your application code. For example, once the full packet is received in Normal mode, check the size field before processing. In Circular mode, use framing (start/stop markers) to separate one message from another.


Tips to avoid data loss / overflow

  • Always process the buffer quickly in callbacks.
  • Use half-transfer callbacks in Circular mode for real-time processing.
  • Ensure the sending device respects delays between packets.
  • If high data rates are expected, increase the buffer size or baud rate.
  • Debug with a logic analyzer or serial terminal to confirm data integrity.

Common Errors & Troubleshooting

Even with DMA, errors can occur. Here are some frequent problems and fixes:

Missing interrupt / callback issues

  • Make sure DMA interrupts are enabled in STM32CubeMX.
  • Check that HAL_UART_RxCpltCallback or HAL_UART_RxHalfCpltCallback are properly implemented.
  • Verify that the huart->Instance matches the correct USART peripheral.

Incorrect DMA configuration

  • Ensure the DMA channel/stream matches the correct USART RX request.
  • Use the correct data width (usually byte/8-bit).
  • Confirm the DMA direction is Peripheral to Memory for UART RX.

Buffer overrun or incomplete transfers

  • Overrun happens when the buffer is too small or processed too slowly.
  • In Normal mode, if fewer bytes arrive than expected, the callback won’t trigger.
  • In Circular mode, if the CPU doesn’t read data before DMA overwrites it, bytes will be lost.

Serial sender-side mismatches

  • Mismatched baud rate, parity, or stop bits cause framing errors.
  • Ensure both sender and STM32 UART settings match.
  • If using modules (e.g., ESP8266, GPS), confirm their serial configuration first.

Real-World Use Cases

Using UART DMA receive in STM32 makes sense in many real applications:

Receiving files or logging data

UART DMA can receive a large chunk of data (e.g., configuration files or logs) without blocking the CPU, making it ideal for file transfer applications.


Sensor or telemetry streams

For continuous sensor data (GPS coordinates, accelerometers, telemetry packets), Circular DMA ensures no data is missed while freeing the CPU for processing.


Audio / video or continuous streaming applications

High-speed communication, like audio sampling or video data transfer, requires DMA because interrupts or blocking modes cannot keep up with the speed.

Conclusion

Using UART with DMA in STM32 is one of the most efficient ways to handle data reception. Unlike blocking or interrupt methods, DMA offloads the transfer process from the CPU, allowing smooth and continuous communication even at high speeds. In this guide, we explored both Normal mode and Circular mode, understood their pros and cons, and learned how to manage buffers and callbacks properly.

By applying these techniques, you can make your STM32 projects more reliable and scalable—whether you’re working with fixed-size packets, continuous sensor streams, or high-speed data logging. With careful buffer planning and correct DMA configuration, you’ll avoid common pitfalls like overflows and missed data.

If you’re ready to go further, try combining UART DMA with IDLE line detection or explore UART transmit with Interrupt for complete, non-blocking communication in your STM32 applications.

STM32 UART Tutorial Series

1 2

PROJECT DOWNLOAD

Info

You can help with the development by DONATING Below.
To download the project, click the DOWNLOAD button.

ESP32 UART DMA FAQs

Subscribe
Notify of

0 Comments
Newest
Oldest Most Voted
Inline Feedbacks
View all comments