Tuesday, October 16, 2012

Critical Bugs and Quality Assurance

Sebastian Huber recently posted a nasty RTEMS bug and fix. While simple, the bug manifested in their application as an increase in one task's latency from 20us to 170us! The cause of the problem was that RTEMS has two kinds of critical sections—dispatch and interrupt—and new SMP-aware code added a dispatch critical section to the thread dispatch code. The problem happens with overlapping a traditional interrupt disable critical section. The SMP dispatch code looks like:
void _Thread_Dispatch(void) {
  Thread_Disable_dispatch();
  SMP_Dispatch_other cores();
  Interrupt_Disable();
  ...
  ... do things, including context switch if needed
  ...
  Interrupt_Enable();
  // problem here!
  Thread_Unnest_dispatch(); // enable
  ...
}
The problem is that an interrupt can occur between Interrupt_Enable and Thread_Unnest_dispatch. Suppose a low-priority task (L) is executing _Thread_Dispatch, and the interrupt enables a high-priority task (H), but
H will not be dispatched because dispatching is disabled. Instead L enables dispatching and resumes executing, which is a priority inversion!

The fix reverts the changes made for SMP. For the SMP code, the priority inversion still exists and is unresolved. (RTEMS currently does not make real-time guarantees for the SMP support, so no one cares yet.)

In the broader picture, the bug seems like it should be easy to detect. The issue with free open-source software (FOSS) is that quality assurance (QA) is almost non-existent: the "many eyeballs" philosophy argues against QA. But what else can FOSS do? No one is going to pay for extensive testing, and if they do they have no incentive to share.

FOSS communities (and corporate developers) need better tools for software QA. This summer RTEMS had a GSOC student who was looking at testing. Testing is probably the first tool in the QA toolbox, and the only one most developers have a clue about; how about static analysis, path coverage, standards conformance, or certification? Some interesting work modeling, proving, and certifying systems is out there: Where is the undergraduate textbook and course on QA?

No comments:

Post a Comment