Why do we need to consider reliability in firmware? The firmware is a binary program file that runs in the microcontroller. The microcontroller is a hardware device, to function consistently over time, without errors. Yes, however, the hardware is not perfect. The performance can be varied by various factors such as noise, temperature, etc. The firmware is usually running on Flash memory in the microcontroller. When reliability is not considered, during run time, firmware binary can not be recovered from common bit-flip errors in Flash memory. If the firmware runs in the aircraft, the worst case could be a crash!
Due to the imperfection of the hardware, firmware developers need to know various techniques such as regression testing, code reviews and error-handling mechanisms to prevent undefined behavior.
The regression testing verifies the most recent commit does not have any unintended negative effects on the existing functionality of the devices. The most mature development environment would have a CI pipeline to handle this process automatically.
The code review is common practice when you develop the firmware with other developers. Essentially, your code is reviewed by your co-worker with the company's coding guidelines to find out any coding errors, syntax mistakes, logical flaws and performance issues.
The error-handling mechanism detects, handles, and recovers from errors and exceptions. One common example is the try-catch block in many programming languages. This example is the universal one. The next example is more specific to the firmware.
When your firmware deals with serial communication such as UART, there are many techniques to help reliability. To ensure UART reliability, enable parity checking in firmware to add an extra bit to each data byte to detect transmission errors. Also, the developers can define the user-specific command as multiple bytes including CRC. The proper design can also help reliability. When CRC verification fails on the receiver side, request the latest frame again to retry.
Another example is filling unused flash with known patterns and adding CRC end of the flash. After the firmware update is complete, the bootloader can verify the received application data before jumping to the application. Because the bootloader's only job is downloading application binary and verifying it. When failure happens, it is in a fail-safe program. If you are developing under IAR Workbench, this could be easily done by IAR ELF Tool settings. Here is the link.