back to article Microsoft releases open source bug-bomb in the rambling house of C

The zombie bugs in programs and libraries at the heart of the Internet's infrastructure often have the C programming language in common. Microsoft Research now wants to add the kind of bounds-checking seen in C# to C, to help splat bugs like “buffer overruns, out-of-bounds memory accesses, and incorrect type casts,” in an add- …

Page:

  1. martinusher Silver badge

    C is not an applications programming language

    C is a systems programming language, a type of language that's not designed for writing end user applications. You're supposed to use it to write operating systems components and language environments with it, software that runs in predictable ways which allow for complete testing. Applications languages are designed for a different world, a world where the language system implementers can't predict everything that a users will do with it, so these languages incorporate a lot more checking, trading speed and light footprint for reliability.

    Unfortunately all this got screwed up starting in the 80s in the rush to build and sell PC applications -- companies (with MSFT leading the charge) were in such a hurry to get stuff out the door that they cut corners on languages, promoting 'C' as an applications language. They tend to fix inherent problems by patches on patches so we got the whole C++ house of cards as a way to make C more reliable (it didn't). Now they just want to modify the run-time environment 'to make it more reliable'.

    I'd suggest that they don't bother. They have applications class languages with things like C#. They would be better off fixing the other popular applications languages (including doing something about Javascript -- that's a travesty, pretty much everything you shouldn't do in a language wrapped up in one package).

    (In case anyone thinks I'm a mainframe retread, no, not at all. I was an early adopter of PCs -- CP/M PCs -- which despite their limitations had a full software ecosystem available to them. Most of the time I write embedded code -- its a different world to the one that apps people live in.)

    1. Charles 9

      Re: C is not an applications programming language

      I don't think it was that persay. One thing people were clamoring for, especially in the 80s when things were a lot slower, was raw performance. Speed sold, and since C ran "close to the metal", it produced FAST code. That's the big problem with bounds-checking: it necessarily draws a performance penalty in a world where speed mattered. Even now programs are expected to do more, so speed still matters. Who cares about security if you can't make the deadline?

      As for all the other languages, your only solution is to ban them, but given so much relies on them (just like with Flash), getting them out of the ecosystem is going to be a slog, especially since it's in an official spec AND there's little in the way of a substitute, especially for pages that need to be updated for current events quickly.

      1. MacroRodent
        Boffin

        Re: C is not an applications programming language

        That's the big problem with bounds-checking: it necessarily draws a performance penalty in a world where speed mattered.

        Yes, if done naïvely, but a good compiler can actually eliminate most of the overhead (for example, deduce that looping over an array needs to check the bounds only once). Of course, the early compilers for microcomputers were limited in this department.

        1. Charles 9

          Re: C is not an applications programming language

          That only works for STATIC bounds-checking, but a lot of the overruns come from DYNAMIC buffers with bounds only known at runtime (if at all, if the buffer comes from elsewhere). Only a runtime bounds-checker can detect these, and these come with performance penalties: not desirable if you have a speed demand.

          1. MacroRodent

            Re: C is not an applications programming language

            That only works for STATIC bounds-checking, but a lot of the overruns come from DYNAMIC buffers with bounds only known at runtime

            This gets language-dependent. If you have a language where the compiler knows how the size of a dynamic array can be determined (for example Java), it can optimize bounds checking also in those cases. I agree this is hard to make work in C, and we might not even want to, if we just use C as a close-to-the-metal language, and use something else for higher-level applications.

            About that Java, which always has array bounds checking enabled: Last summer I spent some idle time trying to see how well various current languages do on the classic Eratosthenes Sieve benchmark (which mainly loops through an integer array). The test was on CentOS7 Linux, and the "contestants" included C++ (GCC 4.8.3), Java (1.7), Python (2.7.5) and JavaScript (Node.js 0.12.7). The clear winner? Java. C++ was close, of course. Of the two dynamic languages, JavaScript beat Python handily, it was about 10 times as fast, and achieved about half of the C++ or Java performance (which I find impressive).

            1. Anonymous Coward
              Anonymous Coward

              Re: C is not an applications programming language

              I'd be wary of drawing conclusions from implementing half a page of code in various languages and running it.

              I also find it hard to believe that you'll outperform C or C++ in an integer focused task, using a JVM language. I'd be very interested to replicate your results, if you provide some details on your methodology.

              1. MacroRodent

                Re: C is not an applications programming language

                I'd be wary of drawing conclusions from implementing half a page of code in various languages and running it.

                I fully agree one should not draw too many conclusions from microbenchmarks like this, but it helps get a feel of how various features behave in different languages or compilers.

                I also find it hard to believe that you'll outperform C or C++ in an integer focused task, using a JVM language. I'd be very interested to replicate your results, if you provide some details on your methodology.

                After thinking about it, I did not find hard to understand. Java is a statically typed language, and modern JVM:s do JIT, where they can apply all the same optimizations as the C++ compiler (at least for algorithms like this that do not require using run-time type information). So it gets down to which compiler has the better code generator. If you want to check for yourself, see macrorodent.blogspot.fi, where I just copied the benchmarks. If you get interesting results, please post comments there.

                1. Anonymous Coward
                  Anonymous Coward

                  Re: C is not an applications programming language

                  If you want to check for yourself, see macrorodent.blogspot.fi, where I just copied the benchmarks. If you get interesting results, please post comments there.

                  Cheers, I'll check it out.

                2. Anonymous Coward
                  Anonymous Coward

                  Re: C is not an applications programming language

                  I'm not sure you can really measure in this manner, in the code as posted, you have an integrated timer, which starts to explain some of your differences, for example, the python slowdown over javascript is likely related to the explicit delay loop in the python code.

                  Your other difference is explained by not using the stack in C++, hence it's not exactly idiomatic C++ code, I would expect the rest of the difference comes from actually freeing the memory in C++, hence incurring the overhead that java avoids by just exiting and allowing the O/S to reclaim the memory.

                  I would suggest using a high precision external timer using the monotonic clock, and measuring using the same timer across all the candidates, essentially you are looking at needing a few hundred thousand iterations in order to converge on something like an reasonable approximation.

                  Thank you for posting your methods, and opening the debate.

                  1. MacroRodent
                    Boffin

                    @sed gawk Re: C is not an applications programming language

                    Thanks for your comments. some replies: The delay loop at the start of some versions is meant to bring a low-resolution (one second) clock function to the next tick, so the actual measured code starts just after a second has flipped over. This reduces jitter a bit. However, I'm not sure how much it mattered. For example the difference between Python 2.7 and Javascript on node.js was very large, any clocking method would have detected it. But I agree that using the time libraries of each language is one potential source of error in close cases, because they may be implemented more or less efficienly. This can be mitigated by doing a lot of computation between peeking at the clock, like the test programs in fact try to do.

                    About the dynamically allocate array in C++: I did it that way to keep the versions in different languages closer, and believe it should not have any effect. Firstly, the allocation and deallocation of the array occurs outside the measurement loop, so that overhead is not included. Secondly, any C or C++ compiler worth its salt will keep the base address of the allocated array in a CPU register during a tight loop like this, so there is no difference between accessing it and a stack-allocated array (which would in fact also be accessed indirectly via a register).

          2. Whitter
            Boffin

            Re: C is not an applications programming language

            It also applies to a dynamic arrays allocated by malloc: so long as the operation(s) in question can be seen not to resize a local array, then only two boundary checks are required (e.g. a for loop over a const * const pointer)

        2. Zakhar

          Re: C is not an applications programming language

          No, it is not necessary such a big penalty!

          I worked on an O.S. unfortunately long dead now (CTOS) where the equivalent of malloc() was sitting on a x86 segment fitted to the size of memory you alloc'ed.

          Then if you where overrunning, the processor triggered a segment-fault and the O.S. just catched that.

          The overhead is minimal (add a segment to the LDT) and you can trap any overrun from any language. Sure when you DO trap, there is a huge overhead... but you are debugging then!

          Unfortunately, "segments" is a concept that is quite specific to x86, whereas "pages" is much more common. So, for the sake of portability, modern O.S.es like Linux do not use segments at all (at least not as explained above) because they would need complete rewrites on, let's say ARM, that do not have "segments".

          But M$ being almost x86 only (seeing how that worked out for W$-RT), I believe they could have done that long time ago. In the CTOS age, we were already trapping bugs that the early versions of W$ didn't catch... but that was understandable since it was just a Windows manager on top of the good old M$-DOS!.. I see that didn't change since the 80's: Bravo!..

          1. MacroRodent

            Re: C is not an applications programming language

            The overhead is minimal (add a segment to the LDT) and you can trap any overrun from any language. Sure when you DO trap, there is a huge overhead... but you are debugging then!

            Actually there is quite a bit of overhead with this method, because access to such far data requires generating a more complex code sequence than for data in the "default data segment". You need to load a segment register (a compiler can sometimes optimize this away, but usually not, and there are not many of these registers, only ES, FS and GS are free for general use). Loading the segment register is expensive in protected mode in the 386 architecture (it loads the descriptor data and checks protections), and the overhead has even got worse in succeeding generations of the Intel architecture, because it is seen by Intel as a legacy feature that almost nobody uses. It is kept around for compatibility, but they don't care about its performance.

            Yes, I too have worked with an embedded system that uses the Intel segmentation feature for fine-grained memory protection (still occasionally do), and I can assure you it is a bad idea!

          2. patrickstar

            Re: C is not an applications programming language

            The 9x line of Windows (which is unrelated to what's known as Windows today) had this. OS/2, which was at one point what MS planned to be the next big OS, had it as well.

            NT, which is what modern Windows is built on, wasn't even written for x86 originally, had portability as a major goal, and wasn't ported to x86 until relatively late in the development of the first version, so it had to go with the lowest common denominator.

            At some point it has existed for (at least) i860, Alpha, PPC, ARM and probably some archs I've forgotten. AFAIK none of these have segmentation comparable to x86.

            1. Richard Plinston

              Re: C is not an applications programming language

              > NT, which is what modern Windows is built on, wasn't even written for x86 originally,

              Exactly. It was developed initially on i860 but them moved to MIPS. These were much more powerful (and expensive) than the contemporary 80486. The Pentium wasn't available until late in the development.

      2. Steve Channell
        Pint

        C is an applications programming language

        The whole of C/UNIX started from a requirement for a typesetting system, it's just that the app language C appeared so fast and efficient that they didn't need assembler for the OS..

        Fast forward ten years, and everyone noticed that C was better than Intel's PL/M.

        It is good to see MS returning to an Engineering outfit (they introduced far* for 8086 segments) with ptr<> array_ptr<> and span<> formalising the CRT convention that the WORD prior to the malloc pointer contains the length of the allocated buffer.

        1. oldcoder

          Re: C is an applications programming language

          Nope.

          Goes back to the original allocator in K&R C runtime. The location below the returned pointer contain a structure of the size and address of the next block.

        2. nijam Silver badge

          Re: C is an applications programming language

          > The whole of C/UNIX started from a requirement for a typesetting system

          That's a complete misinterpretation of history, I'm afraid.

    2. Mark 85

      Re: C is not an applications programming language

      It may not be one but back in the 80's, C and Assembly was the way to go for performance programming like games. I did a bit of that and learned a lot. C++ had some advantages but still.. it could be tripped up by users (depending the application).

      As a sidenote, I'm wondering of a lot of the Windows/IE issues stem from this. The OS, etc. has really become too large to re-code but if they can fix it in the library and recompile.... Yeah.. distribution would be a problem.

      1. CheesyTheClown

        Re: C is not an applications programming language

        C and Assembler was the way to go for everything back then. Assembler was actually used as an application language by many people. When a CPU could realistic process 75,000 instructions per second, we counted cycles even when we were drawing text on the screen. When a language that when coded properly like C reached levels of perfomance that we could do less in assembler, we mixed the two. It wasn't that better languages for apps didn't exist. It was that they were too slow to be useful.

      2. Ken Hagan Gold badge

        Re: C is not an applications programming language

        "As a sidenote, I'm wondering of a lot of the Windows/IE issues stem from this. "

        Unlikely, since Windows and IE are almost certainly written in C++ and whilst you /can/ write push old-school C code through a C++ compiler (*), you don't have to because bounds-checked and non-leaky alternatives exist.

        (* Bootnote: MSVC is a C++ compiler and, much to the annoyance of C fans, MS don't actually *do* a C compiler, so it is slightly odd that MS Research are issuing tools aimed at C code.)

        1. Mark 85

          Re: C is not an applications programming language

          Unlikely, since Windows and IE are almost certainly written in C++

          You're probably right although there was the story/rumor going that one version of Windows was written in Visual Basic so I guess only the coders really know.

          1. oldcoder

            Re: C is not an applications programming language

            Windows is written in C, not C++.

            1. Bronek Kozicki

              Re: C is not an applications programming language

              Windows is written in C, not C++.

              Nope, it is written in both. Kernel is in C, huge majority of userspace code in C++. I signed NDA but this much I can reveal, and I do not think much has changed since the time I saw these sources.

          2. patrickstar

            Re: C is not an applications programming language

            The NT kernel, which powers all of the NT line of Windows (NT, 2000, XP, Vista, 7, 8, 8.1, 10, corresponding server versions, etc) is pure C with some custom extensions like exception handling. And a little bit of assembler for platform specific stuff.

            The rest is a mix of C and C++. Most of the latter is more like 'C with classes', but there's some use of things like templates and smart pointers as well, mostly for COM related stuff.

            I've read much of it, it's not bad (for the most part).

            Some of the bundled applications are C# nowadays, but it's not used for the core OS or libs.

        2. Richard Plinston

          Re: C is not an applications programming language

          > Unlikely, since Windows and IE are almost certainly written in C++

          """Cutler set three main goals for Windows NT. The first goal was portability: in contrast to previous operating systems, which were strongly tied to one architecture, Windows NT should be able to operate on multiple architectures.[60] To meet this goal, most of the operating systems, including the operating system core, had to be written in the C programming language.[61] """

          The graphics system was in C++.

          > MSVC is a C++ compiler and, much to the annoyance of C fans, MS don't actually *do* a C compiler

          """Visual C++ 2015 [MSVC 14] further improves the C99 support, with full support of the C99 Standard Library"""

        3. oldcoder

          Re: C is not an applications programming language

          Last I read, Microsoft C++ wasn't standard, even though MS claimed it was.

          It may be better, but I doubt very much that it is really standard.

    3. bombastic bob Silver badge

      Re: C is not an applications programming language

      I disagree with that subject line... C is a perfectly good application programming language. the thing is, coders need to self-enforce a few simple rules, and use methods that aren't inherently problematic.

      you know, like 'strcpy(buffer, string)' --- should be 'strncpy(bufer, string, maxlen)'

      Point is: learn to FREAKING CODE. Don't code like a script kiddie. Don't allow script kiddies to commit code that don't check buffer lengths. that kind of thing.

      And DO! NOT! RELY! ON! THE! COMPILER! TO! PROTECT! YOU!! Protect YOURSELF.

      then again, Micro-shaft designed C-pound and ".Not" for the INEXPERIENCED coder, so that senior people wouldn't be "senior" any more...

      1. Will 30

        Re: C is not an applications programming language

        You appear to misunderstand strncpy. It is not a 'safe' version of strcpy, it's something completely different.

        I'm not sure whether that helps or hinders your argument that people should learn to code, but it's certainly further evidence that 'c' is a very easy language to make mistakes in.

      2. Anonymous Coward
        Anonymous Coward

        Re: C is not an applications programming language

        "strncpy(bufer, string, maxlen)"

        Isn't much safer, as if the length of the source is longer than maxlen, the resulting destination isn't null terminated.

      3. Phil O'Sophical Silver badge

        Re: C is not an applications programming language

        you know, like 'strcpy(buffer, string)' --- should be 'strncpy(bufer, string, maxlen)'

        No, it should be 'strlcpy(bufer, string, maxlen)' which prevents overflow and gurantees null-termination. The "l" form of strcat is even more useful, since you don't need to mess around with strlen calculating how much is left in the buffer. Errors there lead to so many off-by-one mistakes.

        If you must use strncpy, then at least use 'strncpy(bufer, string, maxlen-1)' to make room for the null.

        1. MacroRodent

          Re: C is not an applications programming language

          If you must use strncpy, then at least use 'strncpy(bufer, string, maxlen-1)' to make room for the null.

          Reasons for that include having to take into account old C libraries. The strl* functions are newfangled inventions. I recall reading somewhere the reason for the dangerous behaviour of strncpy when the target size is exceeded comes from its usage in the original Unix file system, where file name components were limited to 14 characters. They were stored in fixed-size directory entries with 14 bytes reserved for the name, and only names shorter than 14 were nul-terminated. So strncpy with size 14 writing to the file name field did the right thing...

        2. dajames

          Re: C is not an applications programming language

          If you must use strncpy, then at least use 'strncpy(bufer, string, maxlen-1)' to make room for the null.

          That doesn't really help. The problem is not that the string is too long for the buffer, but that the buffer is too short for the string. Throwing away some of the data to make the rest fit in the program is NOT the right answer.

          1. Anonymous Coward
            Anonymous Coward

            Re: C is not an applications programming language

            It's a shame that null-terminated strings became the standard in C and most low-level APIs. Their slight space/speed advantage goes out the window when you do length checks.

            C was a decent language that could've used an overhaul in the 1990s to address a few issues like this. Instead, we got the frankenstein monster C++.

      4. Uplink

        Re: C is not an applications programming language

        <quote> Point is: learn to FREAKING CODE. Don't code like a script kiddie. Don't allow script kiddies to commit code that don't check buffer lengths. that kind of thing.</quote>

        And what is one supposed to do before becoming master of the code universe? Most people aren't born "senior coder", and to most of those people coding is just a job - a thing that gives them money; a thing they're looking to get away from everyday and not looking forward to returning the next day if it were not for the money.

        Luckily for me, I found out about strcpy vs strncpy while still in school, but that's not mentioned in any classes. You learn srtcpy and then move to the next lesson. strncpy is not mentioned.

        Applications also have this property: "we need it yesterday!". Even the most seasoned programmer can easily introduce a off-by-one error. I recently had a go on hackerrank at some C issues and while my algorithm was sane, I made a typo: I sized an array using the wrong variable, so it ended up shorter than intended. That meant that, with all the memory smashing, my code passed 10 out of 12 tests. The two that failed segfaulted. It took me forever before I saw the error and facepalmed.

      5. energystar
        IT Angle

        Bombaa!

        "then again, Micro-shaft designed C-pound and ".Not" for the INEXPERIENCED coder, so that senior people wouldn't be "senior" any more..."

        By now EVEN MS knows that production code can't go out without throughly senior supervision.

      6. Anonymous Coward
        Anonymous Coward

        Re: C is not an applications programming language

        "Micro-shaft designed C-pound"

        They wrote another version of C?! Is C£ any good ?

        1. Richard Plinston

          Re: C is not an applications programming language

          > Is C£ any good ?

          The octothorpe is an approximation of several distinct graphics. In this case he was using it in one of the common approximations, that of 'lb' the symbol for pound weight.

          Personally I refer to C# as 'making a hash of C'.

          You may also note that the US keyboard has 'hash' above the 3 while you have 'UKP'.

    4. Paul Shirley

      Re: C is not an applications programming language

      I'll let you in on a secret: system components, OSes and whatever you think'language environments' means also need bounds checks. They deal with unpredictable client requests like any app even if written by super coders magically able to make their own code 100% deterministic and somehow not needing checks. Even god like coders make mistakes anyway.

      Thinking like that keeps security researchers employed.

      1. energystar
        Boffin

        "Thinking like that keeps security researchers employed."

        and CPUs vigilant at another CPUs steps. Was this at the original script?

      2. oldcoder

        Re: C is not an applications programming language

        guess what? even in system components it cannot be checked by the compiler... Since each may be compiled separately, there is NO sharing... only the system call parameter checking. And that cannot be done by a compiler.

    5. energystar
      Linux

      Re: C is not an applications programming language

      Agree. A tool like this still needed. Tool making tools still needing 'tooling', at [critical] times.

      Barking at the License. But a great contribution to All computing world. Expecting a little good will from FOSS toward standardization.

    6. Aodhhan

      Re: C is not an applications programming language

      Spoken like a computer end user who is only aware of the "programming for grandparent" languages like visual basic.

      C or rather C++ is still widely used as the basis for many applications, especially those requiring speed and high end calculations. Many applications used to build the console and online games (you apparently spend too much time on), are written in C++.

      Applications used to conduct bank transactions are written in C++ and others use FORTRAN... yeah I know you don't know what this is.

      Just because you see a front in GUI in Windows doesn't mean it's mostly written in C#.

      So.. shut off your gaming console, burn your nasty collection of 4 year old t-shirts, and leave your mother's basement. You just might learn a bit more about programming languages. At least you might do a bit more research before posting.

      1. Richard Plinston

        Re: C is not an applications programming language

        > Applications used to conduct bank transactions are written in COBOL

        FTFY

  2. Anonymous Coward
    Anonymous Coward

    Bounds checking for C and C++

    Bounds checking for C and C++ Nov 2004

    1. Anonymous Coward
      Anonymous Coward

      Re: Bounds checking for C and C++

      I always wondered why when there have been multiple GCC patchsets to add bounds checking to gcc that it never went anywhere. Sure, it hurts performance, and sure it isn't able to a perfect solution without creating new types with bounds information built in. But it would be a big improvement over unchecked C, and there's tons of code where either performance doesn't matter because it mostly waits on other stuff like the end user, network, storage etc. or where you are willing to sacrifice performance in critical sections of code where security really matters (i.e. network facing code or stuff that runs as "root", etc.)

      Maybe the GCC folks don't want the patches unless they are perfect, but if so they'll wait forever.

      1. Anonymous Coward
        Anonymous Coward

        Re: Bounds checking for C and C++

        I think the problem is that the C standard doesn't allow for bounds-checking, so anything of the sort can only be inserted unofficially.

        1. Anonymous Coward
          Anonymous Coward

          Re: Bounds checking for C and C++

          I think the problem is that the C standard doesn't allow for bounds-checking, so anything of the sort can only be inserted unofficially.

          Well, bounds-checking is not mentioned in the Fortran standard either, but it has been an (optional) part of every Fortran compiler I ever used, with a possible exception of some rather archaic Fortran-IV dialects. It is present in gfortran as well - so at least the gcc backend has no problem dealing with it.

          1. Paul Shirley

            Re: Bounds checking for C and C++

            I don't believe Fortran defines pointers at the low level C does so there's more freedom to modify Fortran. Ç is little more than high level assembler and deliberately so, that's why C++ exists.

            1. energystar
              Megaphone

              That should hurt, really...

              "Ç is little more than high level assembler and deliberately..."

Page:

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like