Skip to main content

Making C less dangerous

The C language is very powerful, widely used—particularly in the Linux kernel—and very dangerous. One of the Linux engineers outlines how developers can cope with the programming language's security weaknesses.

You can do almost anything with C, but that doesn't mean you should. C code runs quickly, but it has no safety belt. Even if you're a C expert, as are most of the Linux kernel developers, you can still make killer blunders.

Besides the pitfalls of, say, misusing pointer aliasing, the C language itself has fundamental, unfixed bugs that await the unwary. It's those weaknesses that Kees "Case" Cook, Google Linux kernel security engineer, addressed in a seminar at the Linux Security Summit in Vancouver, Canada.

"C is a fancy assembler. It's almost machine code," said Cook, speaking to an audience of several hundred peers, who understood and appreciated the application speed resulting from C. The bad news, however, is that "C comes with some worrisome baggage, undefined behaviors, and other weaknesses that lead to security flaws and vulnerable infrastructure."

Even if you are using C for other development projects, it’s worthwhile to pay attention to its security challenges.

Protecting the Linux kernel

Over time, Cook and the people he worked with discovered numerous native C problems. To deal with these weaknesses, the Kernel Self Protection Project has worked slowly and steadily on protecting the Linux kernel from attack. In the process, it has worked to remove troublesome code from Linux.

That's tricky, Cook said, because "the kernel wants to do very architecture-specific things for memory management, interrupt handling, scheduling, and so on." A lot of code does finicky work, which must be checked carefully. For example, "there's no C API for setting up page tables or switching to 64-bit mode," he said.

With its operational baggage and weak standard libraries, C contains a great deal of undefined behavior. Cook cited—and agreed with—Raph Levien’s blog post "With Undefined Behavior, Anything Is Possible."

Cook gave concrete examples. "What are the contents of ‘uninitialized’ variables? Whatever was in memory from before! Void pointers have no type, yet we can call typed functions through them? Sure! Assembly doesn’t care: Everything can be an address to call! Why does memcpy() have no ‘max destination length’ argument? Just do what I say; memory areas are all the same!"

Ignoring warnings…and not

Some of these idiosyncracies are relatively easy to deal with. Cook commented, "Linus [Torvalds] likes the idea of always initializing local variables. So, you should 'just do it.'"

This advice comes with a caveat. If you initialize a local variable in a switch, you get the warning, "Statement will never be executed [-Wswitch-unreachable]" because of how the compiler processes the code. You can ignore this warning.

But don’t ignore every warning. "Variable-length arrays are always bad," Cook said. The short list of problems includes stack exhaustion, linear overflow, and jumps over guard pages. Plus, Cook discovered, VLAs are slow. Removing the VLAs improved performance by 13 percent. Both faster and safer—now that's a win.

While VLAs have nearly been eradicated from the kernel, Cook said, they're still present in some code. Fortunately, VLAs are easy to find by using the -Wvla compiler flag.

So, in a word, when it comes to using VLAs, stop!

Hybrid Cloud Management for Dummies: A crash course in hybrid cloud management

A different kind of problem is hidden within C's semantics. That is, when someone omits "break" from a switch statement, did the programmer mean it? Omitting break in a switch can cause code to execute from multiple conditions; it is a well-known problem.

If you're hunting down break/switch statements in existing code, you can use -Wimplicit-fallthrough to add a new switch statement. This is really a comment, but modern compilers parse it. You can also mark nonbreaks with a “fallthrough” comment.

Cook also found that bounds-checking for slab memory allocation is slow. For instance, strcpy()-family checking is accompanied by a 2 percent performance hit. Alternatives such as strncpy() come with their own problems. Strncpy, you see, doesn’t always NULL terminate. Cook plaintively asked, "Can we get better APIs?"

During the Q&A session, one Linux developer asked, "Can we get rid of old, bad APIs?” Cook replied that for a while, Linux supported the notion of deprecating APIs. However, Torvalds dumped the idea, reasoning that if any API needed to be deprecated, it should be thrown out entirely. But, getting rid of APIs for good is a "political hot potato," Cook added. So, for now, we're stuck with them.

The long-term solution? More security-savvy open source developers

Cook sees a long hard road ahead. While at times, the idea of coming up with a Linux C dialect has been attractive, that's not going to happen. The real issue behind the problem of dangerous code is "people don't want to do the work to clean up code—not just bad code, but C itself," he said. As with all open source projects, "we need more dedicated developers, reviewers, testers, and backporters."

Dangerous C: Lessons for leaders

  • C is mature and powerful—and that brings technical and security challenges.
  • Linux developers pay special attention to making C safer (but no less powerful) because so much of the operating system is written in it.
  • The Google Linux kernel security engineer identified specific language weaknesses and an explanation of how developers can address them.

This article/content was written by the individual writer identified and does not necessarily reflect the view of Hewlett Packard Enterprise Company.