This might be old news for learned folks but this came across as news for me. Yesterday lwn.net had an posted an article on LinuxDNA project reporting that they have been able to successfully compile Linux kernel using Intel C compiler. While this did not excite me enough, one comment did catch my attention.
Apparently the legendary Ken Thomson admitted a fiendishly clever backdoor while accepting the 1983 Turing Award. In the early versions of Unix, the C compiler contained code that would recognize when the login command was being recompiled and insert some code recognizing a password chosen by Thompson, giving him entry to the system whether or not an account had been created for him.
But this is not the best part. In a source code review of compiler such a back door could be easily removed by removing it and recompiling the compiler.Now you need a compiler to recompile this compiler. Thompson also arranged that the compiler would recognize when it was compiling a version of itself, and insert into the recompiled compiler the code to insert into the recompiled login and the code to recognize itself and do the whole thing again in future recompilation of compilers. Having done this once, he was then able to recompile the compiler from the original sources; the hack perpetuated itself invisibly, leaving the back door in place and active but with no trace in the sources.
(Text taken from the Jargon files)
He also made sure that the subverted compiler also subverted the disassembler, so that anyone who examined the binaries in the usual way would not actually see the real code that was running, but something else instead.
A subscriber JoeBuck on lwn thread then explained how this is checked in GCC. Below are his words. I hope there is no copyright violation.
GCC is built with a three-stage bootstrap procedure. First the compiler is built with some C compiler, that might be an older GCC, or might be a different compiler entirely. The result is “stage 1”. Next, GCC is built again, by the “stage 1” compiler, to produce “stage 2”. Finally, GCC is built with “stage 2” and the result is “stage 3”. We then check to see whether “stage 2” is bit-for-bit identical (other than date stamps in object files) with “stage 3”. If it isn’t, we report a failure. The key is that this process is designed to remove any dependence in the final compiler from the initial compiler. This check is run every time gcc is built from source, and every developer must run this check before any patch is acceptable (plus all the other regressions, of course).
Now, suppose that you suspect that your GCC has a version of the Thompson hack installed. The check is simple: just do the three-stage bootstrap starting with a different compiler, and verify that you get an identical result. You’ve either proven that there’s no hack, or that the other compiler has the hack too. You can repeat the process using cross-compilation. If you carry this out, you’ll be forced to conclude that either there is no Thompson hack, or else that every C compiler you tried has the identical hack.
Thomson’s view on this can be read here
Oh what you learn from reading footnotes, comments, trivia