back to article Microsoft releases open source bug-bomb in the rambling house of C

The zombie bugs in programs and libraries at the heart of the Internet's infrastructure often have the C programming language in common. Microsoft Research now wants to add the kind of bounds-checking seen in C# to C, to help splat bugs like “buffer overruns, out-of-bounds memory accesses, and incorrect type casts,” in an add- …

  1. martinusher Silver badge

    C is not an applications programming language

    C is a systems programming language, a type of language that's not designed for writing end user applications. You're supposed to use it to write operating systems components and language environments with it, software that runs in predictable ways which allow for complete testing. Applications languages are designed for a different world, a world where the language system implementers can't predict everything that a users will do with it, so these languages incorporate a lot more checking, trading speed and light footprint for reliability.

    Unfortunately all this got screwed up starting in the 80s in the rush to build and sell PC applications -- companies (with MSFT leading the charge) were in such a hurry to get stuff out the door that they cut corners on languages, promoting 'C' as an applications language. They tend to fix inherent problems by patches on patches so we got the whole C++ house of cards as a way to make C more reliable (it didn't). Now they just want to modify the run-time environment 'to make it more reliable'.

    I'd suggest that they don't bother. They have applications class languages with things like C#. They would be better off fixing the other popular applications languages (including doing something about Javascript -- that's a travesty, pretty much everything you shouldn't do in a language wrapped up in one package).

    (In case anyone thinks I'm a mainframe retread, no, not at all. I was an early adopter of PCs -- CP/M PCs -- which despite their limitations had a full software ecosystem available to them. Most of the time I write embedded code -- its a different world to the one that apps people live in.)

    1. Charles 9

      Re: C is not an applications programming language

      I don't think it was that persay. One thing people were clamoring for, especially in the 80s when things were a lot slower, was raw performance. Speed sold, and since C ran "close to the metal", it produced FAST code. That's the big problem with bounds-checking: it necessarily draws a performance penalty in a world where speed mattered. Even now programs are expected to do more, so speed still matters. Who cares about security if you can't make the deadline?

      As for all the other languages, your only solution is to ban them, but given so much relies on them (just like with Flash), getting them out of the ecosystem is going to be a slog, especially since it's in an official spec AND there's little in the way of a substitute, especially for pages that need to be updated for current events quickly.

      1. MacroRodent
        Boffin

        Re: C is not an applications programming language

        That's the big problem with bounds-checking: it necessarily draws a performance penalty in a world where speed mattered.

        Yes, if done naïvely, but a good compiler can actually eliminate most of the overhead (for example, deduce that looping over an array needs to check the bounds only once). Of course, the early compilers for microcomputers were limited in this department.

        1. Charles 9

          Re: C is not an applications programming language

          That only works for STATIC bounds-checking, but a lot of the overruns come from DYNAMIC buffers with bounds only known at runtime (if at all, if the buffer comes from elsewhere). Only a runtime bounds-checker can detect these, and these come with performance penalties: not desirable if you have a speed demand.

          1. MacroRodent

            Re: C is not an applications programming language

            That only works for STATIC bounds-checking, but a lot of the overruns come from DYNAMIC buffers with bounds only known at runtime

            This gets language-dependent. If you have a language where the compiler knows how the size of a dynamic array can be determined (for example Java), it can optimize bounds checking also in those cases. I agree this is hard to make work in C, and we might not even want to, if we just use C as a close-to-the-metal language, and use something else for higher-level applications.

            About that Java, which always has array bounds checking enabled: Last summer I spent some idle time trying to see how well various current languages do on the classic Eratosthenes Sieve benchmark (which mainly loops through an integer array). The test was on CentOS7 Linux, and the "contestants" included C++ (GCC 4.8.3), Java (1.7), Python (2.7.5) and JavaScript (Node.js 0.12.7). The clear winner? Java. C++ was close, of course. Of the two dynamic languages, JavaScript beat Python handily, it was about 10 times as fast, and achieved about half of the C++ or Java performance (which I find impressive).

            1. Anonymous Coward
              Anonymous Coward

              Re: C is not an applications programming language

              I'd be wary of drawing conclusions from implementing half a page of code in various languages and running it.

              I also find it hard to believe that you'll outperform C or C++ in an integer focused task, using a JVM language. I'd be very interested to replicate your results, if you provide some details on your methodology.

              1. MacroRodent

                Re: C is not an applications programming language

                I'd be wary of drawing conclusions from implementing half a page of code in various languages and running it.

                I fully agree one should not draw too many conclusions from microbenchmarks like this, but it helps get a feel of how various features behave in different languages or compilers.

                I also find it hard to believe that you'll outperform C or C++ in an integer focused task, using a JVM language. I'd be very interested to replicate your results, if you provide some details on your methodology.

                After thinking about it, I did not find hard to understand. Java is a statically typed language, and modern JVM:s do JIT, where they can apply all the same optimizations as the C++ compiler (at least for algorithms like this that do not require using run-time type information). So it gets down to which compiler has the better code generator. If you want to check for yourself, see macrorodent.blogspot.fi, where I just copied the benchmarks. If you get interesting results, please post comments there.

                1. Anonymous Coward
                  Anonymous Coward

                  Re: C is not an applications programming language

                  If you want to check for yourself, see macrorodent.blogspot.fi, where I just copied the benchmarks. If you get interesting results, please post comments there.

                  Cheers, I'll check it out.

                2. Anonymous Coward
                  Anonymous Coward

                  Re: C is not an applications programming language

                  I'm not sure you can really measure in this manner, in the code as posted, you have an integrated timer, which starts to explain some of your differences, for example, the python slowdown over javascript is likely related to the explicit delay loop in the python code.

                  Your other difference is explained by not using the stack in C++, hence it's not exactly idiomatic C++ code, I would expect the rest of the difference comes from actually freeing the memory in C++, hence incurring the overhead that java avoids by just exiting and allowing the O/S to reclaim the memory.

                  I would suggest using a high precision external timer using the monotonic clock, and measuring using the same timer across all the candidates, essentially you are looking at needing a few hundred thousand iterations in order to converge on something like an reasonable approximation.

                  Thank you for posting your methods, and opening the debate.

                  1. MacroRodent
                    Boffin

                    @sed gawk Re: C is not an applications programming language

                    Thanks for your comments. some replies: The delay loop at the start of some versions is meant to bring a low-resolution (one second) clock function to the next tick, so the actual measured code starts just after a second has flipped over. This reduces jitter a bit. However, I'm not sure how much it mattered. For example the difference between Python 2.7 and Javascript on node.js was very large, any clocking method would have detected it. But I agree that using the time libraries of each language is one potential source of error in close cases, because they may be implemented more or less efficienly. This can be mitigated by doing a lot of computation between peeking at the clock, like the test programs in fact try to do.

                    About the dynamically allocate array in C++: I did it that way to keep the versions in different languages closer, and believe it should not have any effect. Firstly, the allocation and deallocation of the array occurs outside the measurement loop, so that overhead is not included. Secondly, any C or C++ compiler worth its salt will keep the base address of the allocated array in a CPU register during a tight loop like this, so there is no difference between accessing it and a stack-allocated array (which would in fact also be accessed indirectly via a register).

          2. Whitter
            Boffin

            Re: C is not an applications programming language

            It also applies to a dynamic arrays allocated by malloc: so long as the operation(s) in question can be seen not to resize a local array, then only two boundary checks are required (e.g. a for loop over a const * const pointer)

        2. Zakhar

          Re: C is not an applications programming language

          No, it is not necessary such a big penalty!

          I worked on an O.S. unfortunately long dead now (CTOS) where the equivalent of malloc() was sitting on a x86 segment fitted to the size of memory you alloc'ed.

          Then if you where overrunning, the processor triggered a segment-fault and the O.S. just catched that.

          The overhead is minimal (add a segment to the LDT) and you can trap any overrun from any language. Sure when you DO trap, there is a huge overhead... but you are debugging then!

          Unfortunately, "segments" is a concept that is quite specific to x86, whereas "pages" is much more common. So, for the sake of portability, modern O.S.es like Linux do not use segments at all (at least not as explained above) because they would need complete rewrites on, let's say ARM, that do not have "segments".

          But M$ being almost x86 only (seeing how that worked out for W$-RT), I believe they could have done that long time ago. In the CTOS age, we were already trapping bugs that the early versions of W$ didn't catch... but that was understandable since it was just a Windows manager on top of the good old M$-DOS!.. I see that didn't change since the 80's: Bravo!..

          1. MacroRodent

            Re: C is not an applications programming language

            The overhead is minimal (add a segment to the LDT) and you can trap any overrun from any language. Sure when you DO trap, there is a huge overhead... but you are debugging then!

            Actually there is quite a bit of overhead with this method, because access to such far data requires generating a more complex code sequence than for data in the "default data segment". You need to load a segment register (a compiler can sometimes optimize this away, but usually not, and there are not many of these registers, only ES, FS and GS are free for general use). Loading the segment register is expensive in protected mode in the 386 architecture (it loads the descriptor data and checks protections), and the overhead has even got worse in succeeding generations of the Intel architecture, because it is seen by Intel as a legacy feature that almost nobody uses. It is kept around for compatibility, but they don't care about its performance.

            Yes, I too have worked with an embedded system that uses the Intel segmentation feature for fine-grained memory protection (still occasionally do), and I can assure you it is a bad idea!

          2. patrickstar

            Re: C is not an applications programming language

            The 9x line of Windows (which is unrelated to what's known as Windows today) had this. OS/2, which was at one point what MS planned to be the next big OS, had it as well.

            NT, which is what modern Windows is built on, wasn't even written for x86 originally, had portability as a major goal, and wasn't ported to x86 until relatively late in the development of the first version, so it had to go with the lowest common denominator.

            At some point it has existed for (at least) i860, Alpha, PPC, ARM and probably some archs I've forgotten. AFAIK none of these have segmentation comparable to x86.

            1. Richard Plinston

              Re: C is not an applications programming language

              > NT, which is what modern Windows is built on, wasn't even written for x86 originally,

              Exactly. It was developed initially on i860 but them moved to MIPS. These were much more powerful (and expensive) than the contemporary 80486. The Pentium wasn't available until late in the development.

      2. Steve Channell
        Pint

        C is an applications programming language

        The whole of C/UNIX started from a requirement for a typesetting system, it's just that the app language C appeared so fast and efficient that they didn't need assembler for the OS..

        Fast forward ten years, and everyone noticed that C was better than Intel's PL/M.

        It is good to see MS returning to an Engineering outfit (they introduced far* for 8086 segments) with ptr<> array_ptr<> and span<> formalising the CRT convention that the WORD prior to the malloc pointer contains the length of the allocated buffer.

        1. oldcoder

          Re: C is an applications programming language

          Nope.

          Goes back to the original allocator in K&R C runtime. The location below the returned pointer contain a structure of the size and address of the next block.

        2. nijam Silver badge

          Re: C is an applications programming language

          > The whole of C/UNIX started from a requirement for a typesetting system

          That's a complete misinterpretation of history, I'm afraid.

    2. Mark 85

      Re: C is not an applications programming language

      It may not be one but back in the 80's, C and Assembly was the way to go for performance programming like games. I did a bit of that and learned a lot. C++ had some advantages but still.. it could be tripped up by users (depending the application).

      As a sidenote, I'm wondering of a lot of the Windows/IE issues stem from this. The OS, etc. has really become too large to re-code but if they can fix it in the library and recompile.... Yeah.. distribution would be a problem.

      1. CheesyTheClown

        Re: C is not an applications programming language

        C and Assembler was the way to go for everything back then. Assembler was actually used as an application language by many people. When a CPU could realistic process 75,000 instructions per second, we counted cycles even when we were drawing text on the screen. When a language that when coded properly like C reached levels of perfomance that we could do less in assembler, we mixed the two. It wasn't that better languages for apps didn't exist. It was that they were too slow to be useful.

      2. Ken Hagan Gold badge

        Re: C is not an applications programming language

        "As a sidenote, I'm wondering of a lot of the Windows/IE issues stem from this. "

        Unlikely, since Windows and IE are almost certainly written in C++ and whilst you /can/ write push old-school C code through a C++ compiler (*), you don't have to because bounds-checked and non-leaky alternatives exist.

        (* Bootnote: MSVC is a C++ compiler and, much to the annoyance of C fans, MS don't actually *do* a C compiler, so it is slightly odd that MS Research are issuing tools aimed at C code.)

        1. Mark 85

          Re: C is not an applications programming language

          Unlikely, since Windows and IE are almost certainly written in C++

          You're probably right although there was the story/rumor going that one version of Windows was written in Visual Basic so I guess only the coders really know.

          1. oldcoder

            Re: C is not an applications programming language

            Windows is written in C, not C++.

            1. Bronek Kozicki

              Re: C is not an applications programming language

              Windows is written in C, not C++.

              Nope, it is written in both. Kernel is in C, huge majority of userspace code in C++. I signed NDA but this much I can reveal, and I do not think much has changed since the time I saw these sources.

          2. patrickstar

            Re: C is not an applications programming language

            The NT kernel, which powers all of the NT line of Windows (NT, 2000, XP, Vista, 7, 8, 8.1, 10, corresponding server versions, etc) is pure C with some custom extensions like exception handling. And a little bit of assembler for platform specific stuff.

            The rest is a mix of C and C++. Most of the latter is more like 'C with classes', but there's some use of things like templates and smart pointers as well, mostly for COM related stuff.

            I've read much of it, it's not bad (for the most part).

            Some of the bundled applications are C# nowadays, but it's not used for the core OS or libs.

        2. Richard Plinston

          Re: C is not an applications programming language

          > Unlikely, since Windows and IE are almost certainly written in C++

          """Cutler set three main goals for Windows NT. The first goal was portability: in contrast to previous operating systems, which were strongly tied to one architecture, Windows NT should be able to operate on multiple architectures.[60] To meet this goal, most of the operating systems, including the operating system core, had to be written in the C programming language.[61] """

          The graphics system was in C++.

          > MSVC is a C++ compiler and, much to the annoyance of C fans, MS don't actually *do* a C compiler

          """Visual C++ 2015 [MSVC 14] further improves the C99 support, with full support of the C99 Standard Library"""

        3. oldcoder

          Re: C is not an applications programming language

          Last I read, Microsoft C++ wasn't standard, even though MS claimed it was.

          It may be better, but I doubt very much that it is really standard.

    3. bombastic bob Silver badge

      Re: C is not an applications programming language

      I disagree with that subject line... C is a perfectly good application programming language. the thing is, coders need to self-enforce a few simple rules, and use methods that aren't inherently problematic.

      you know, like 'strcpy(buffer, string)' --- should be 'strncpy(bufer, string, maxlen)'

      Point is: learn to FREAKING CODE. Don't code like a script kiddie. Don't allow script kiddies to commit code that don't check buffer lengths. that kind of thing.

      And DO! NOT! RELY! ON! THE! COMPILER! TO! PROTECT! YOU!! Protect YOURSELF.

      then again, Micro-shaft designed C-pound and ".Not" for the INEXPERIENCED coder, so that senior people wouldn't be "senior" any more...

      1. Will 30

        Re: C is not an applications programming language

        You appear to misunderstand strncpy. It is not a 'safe' version of strcpy, it's something completely different.

        I'm not sure whether that helps or hinders your argument that people should learn to code, but it's certainly further evidence that 'c' is a very easy language to make mistakes in.

      2. Anonymous Coward
        Anonymous Coward

        Re: C is not an applications programming language

        "strncpy(bufer, string, maxlen)"

        Isn't much safer, as if the length of the source is longer than maxlen, the resulting destination isn't null terminated.

      3. Phil O'Sophical Silver badge

        Re: C is not an applications programming language

        you know, like 'strcpy(buffer, string)' --- should be 'strncpy(bufer, string, maxlen)'

        No, it should be 'strlcpy(bufer, string, maxlen)' which prevents overflow and gurantees null-termination. The "l" form of strcat is even more useful, since you don't need to mess around with strlen calculating how much is left in the buffer. Errors there lead to so many off-by-one mistakes.

        If you must use strncpy, then at least use 'strncpy(bufer, string, maxlen-1)' to make room for the null.

        1. MacroRodent

          Re: C is not an applications programming language

          If you must use strncpy, then at least use 'strncpy(bufer, string, maxlen-1)' to make room for the null.

          Reasons for that include having to take into account old C libraries. The strl* functions are newfangled inventions. I recall reading somewhere the reason for the dangerous behaviour of strncpy when the target size is exceeded comes from its usage in the original Unix file system, where file name components were limited to 14 characters. They were stored in fixed-size directory entries with 14 bytes reserved for the name, and only names shorter than 14 were nul-terminated. So strncpy with size 14 writing to the file name field did the right thing...

        2. dajames

          Re: C is not an applications programming language

          If you must use strncpy, then at least use 'strncpy(bufer, string, maxlen-1)' to make room for the null.

          That doesn't really help. The problem is not that the string is too long for the buffer, but that the buffer is too short for the string. Throwing away some of the data to make the rest fit in the program is NOT the right answer.

          1. Anonymous Coward
            Anonymous Coward

            Re: C is not an applications programming language

            It's a shame that null-terminated strings became the standard in C and most low-level APIs. Their slight space/speed advantage goes out the window when you do length checks.

            C was a decent language that could've used an overhaul in the 1990s to address a few issues like this. Instead, we got the frankenstein monster C++.

      4. Uplink

        Re: C is not an applications programming language

        <quote> Point is: learn to FREAKING CODE. Don't code like a script kiddie. Don't allow script kiddies to commit code that don't check buffer lengths. that kind of thing.</quote>

        And what is one supposed to do before becoming master of the code universe? Most people aren't born "senior coder", and to most of those people coding is just a job - a thing that gives them money; a thing they're looking to get away from everyday and not looking forward to returning the next day if it were not for the money.

        Luckily for me, I found out about strcpy vs strncpy while still in school, but that's not mentioned in any classes. You learn srtcpy and then move to the next lesson. strncpy is not mentioned.

        Applications also have this property: "we need it yesterday!". Even the most seasoned programmer can easily introduce a off-by-one error. I recently had a go on hackerrank at some C issues and while my algorithm was sane, I made a typo: I sized an array using the wrong variable, so it ended up shorter than intended. That meant that, with all the memory smashing, my code passed 10 out of 12 tests. The two that failed segfaulted. It took me forever before I saw the error and facepalmed.

      5. energystar
        IT Angle

        Bombaa!

        "then again, Micro-shaft designed C-pound and ".Not" for the INEXPERIENCED coder, so that senior people wouldn't be "senior" any more..."

        By now EVEN MS knows that production code can't go out without throughly senior supervision.

      6. Anonymous Coward
        Anonymous Coward

        Re: C is not an applications programming language

        "Micro-shaft designed C-pound"

        They wrote another version of C?! Is C£ any good ?

        1. Richard Plinston

          Re: C is not an applications programming language

          > Is C£ any good ?

          The octothorpe is an approximation of several distinct graphics. In this case he was using it in one of the common approximations, that of 'lb' the symbol for pound weight.

          Personally I refer to C# as 'making a hash of C'.

          You may also note that the US keyboard has 'hash' above the 3 while you have 'UKP'.

    4. Paul Shirley

      Re: C is not an applications programming language

      I'll let you in on a secret: system components, OSes and whatever you think'language environments' means also need bounds checks. They deal with unpredictable client requests like any app even if written by super coders magically able to make their own code 100% deterministic and somehow not needing checks. Even god like coders make mistakes anyway.

      Thinking like that keeps security researchers employed.

      1. energystar
        Boffin

        "Thinking like that keeps security researchers employed."

        and CPUs vigilant at another CPUs steps. Was this at the original script?

      2. oldcoder

        Re: C is not an applications programming language

        guess what? even in system components it cannot be checked by the compiler... Since each may be compiled separately, there is NO sharing... only the system call parameter checking. And that cannot be done by a compiler.

    5. energystar
      Linux

      Re: C is not an applications programming language

      Agree. A tool like this still needed. Tool making tools still needing 'tooling', at [critical] times.

      Barking at the License. But a great contribution to All computing world. Expecting a little good will from FOSS toward standardization.

    6. Aodhhan

      Re: C is not an applications programming language

      Spoken like a computer end user who is only aware of the "programming for grandparent" languages like visual basic.

      C or rather C++ is still widely used as the basis for many applications, especially those requiring speed and high end calculations. Many applications used to build the console and online games (you apparently spend too much time on), are written in C++.

      Applications used to conduct bank transactions are written in C++ and others use FORTRAN... yeah I know you don't know what this is.

      Just because you see a front in GUI in Windows doesn't mean it's mostly written in C#.

      So.. shut off your gaming console, burn your nasty collection of 4 year old t-shirts, and leave your mother's basement. You just might learn a bit more about programming languages. At least you might do a bit more research before posting.

      1. Richard Plinston

        Re: C is not an applications programming language

        > Applications used to conduct bank transactions are written in COBOL

        FTFY

  2. Anonymous Coward
    Anonymous Coward

    Bounds checking for C and C++

    Bounds checking for C and C++ Nov 2004

    1. Anonymous Coward
      Anonymous Coward

      Re: Bounds checking for C and C++

      I always wondered why when there have been multiple GCC patchsets to add bounds checking to gcc that it never went anywhere. Sure, it hurts performance, and sure it isn't able to a perfect solution without creating new types with bounds information built in. But it would be a big improvement over unchecked C, and there's tons of code where either performance doesn't matter because it mostly waits on other stuff like the end user, network, storage etc. or where you are willing to sacrifice performance in critical sections of code where security really matters (i.e. network facing code or stuff that runs as "root", etc.)

      Maybe the GCC folks don't want the patches unless they are perfect, but if so they'll wait forever.

      1. Anonymous Coward
        Anonymous Coward

        Re: Bounds checking for C and C++

        I think the problem is that the C standard doesn't allow for bounds-checking, so anything of the sort can only be inserted unofficially.

        1. Anonymous Coward
          Anonymous Coward

          Re: Bounds checking for C and C++

          I think the problem is that the C standard doesn't allow for bounds-checking, so anything of the sort can only be inserted unofficially.

          Well, bounds-checking is not mentioned in the Fortran standard either, but it has been an (optional) part of every Fortran compiler I ever used, with a possible exception of some rather archaic Fortran-IV dialects. It is present in gfortran as well - so at least the gcc backend has no problem dealing with it.

          1. Paul Shirley

            Re: Bounds checking for C and C++

            I don't believe Fortran defines pointers at the low level C does so there's more freedom to modify Fortran. Ç is little more than high level assembler and deliberately so, that's why C++ exists.

            1. energystar
              Megaphone

              That should hurt, really...

              "Ç is little more than high level assembler and deliberately..."

          2. oldcoder

            Re: Bounds checking for C and C++

            Even fortran can't bounds check an array passed to a subroutine.

            Even passing the size (though helpful), isn't accurate if the number passed is wrong. Easily going out of bounds...

            1. Richard Plinston

              Re: Bounds checking for C and C++

              > Even fortran can't bounds check an array passed to a subroutine.

              While 'FORTRAN' language may not be able to check, particular compilers can generate code that does check, even on arrays passed to subroutines. It is only necessary for the compiler to generate code to pass an array descriptor that includes the bounds of each subscript, such as was done by compilers that I used in the past.

              1. Anonymous Coward
                Anonymous Coward

                Re: Bounds checking for C and C++

                But only if it KNOWS the bounds. Externally-passed or dynamically-generated data can have fungible or simply unknown bounds. Plus, as noted, what if something manages to get into the bounds information and messes with them? Bounds data is still data.

                1. Richard Plinston

                  Re: Bounds checking for C and C++

                  > Externally-passed or dynamically-generated data can have fungible or simply unknown bounds.

                  The bounds won't be 'unknown' in that the malloc() (or equivalent) must ask for a specific size. There may well be implementations of particular high-level languages that don't bother to pass size information, but there are implementations, such as ones that I used, that pass an array descriptor that has bounds for each subscript, _even_ if it was created dynamically or passed externally.

                  With multi-dimentional arrays (such as often used in Fortran) the code must have the descriptor block in order to convert the several subscripts into a specific memory address.

                  > Plus, as noted, what if something manages to get into the bounds information and messes with them?

                  If the only way to access the array is via the descriptor then the bound checks, or other, can prevent access to the descriptor block.

                  Not all languages are like C.

                  1. Charles 9

                    Re: Bounds checking for C and C++

                    I'm saying what if the malware finds a different way into the bounds data to alter it out of band? That's the thing: for the most part, data is data, and you can perhaps perform something like a Confused Deputy (aka "Barney Fife") attack to mangle the bounds data with another routine. Or mangle the descriptor in transit between programs and/or libraries.

                    PS. Not all languages are like C, but in the end, CPUs run on machine code, and most CPUs, for reasons of speed, don't tag their memory very clearly.

                    1. Richard Plinston

                      Re: Bounds checking for C and C++

                      > I'm saying what if the malware finds a different way into the bounds data to alter it out of band? That's the thing: for the most part, data is data, and you can perhaps perform something like a Confused Deputy (aka "Barney Fife") attack to mangle the bounds data with another routine. Or mangle the descriptor in transit between programs and/or libraries.

                      Or what if the memory doesn't store the bits correctly, or the CPU executes the instruction badly !!!

                      You are just making up spurious situations which have nothing to do with whether the language implementation can do bound checking.

                      1. Charles 9

                        Re: Bounds checking for C and C++

                        "Or what if the memory doesn't store the bits correctly, or the CPU executes the instruction badly !!!"

                        Guess what? Those are real-life concerns. It's one reason why you can't make the processor pathways much smaller (because of quantum tunneling, electrons could "jump the tracks"). As I recall, high-uptime systems have redundancies for that reason.

                        In any event, if Pascal and Fortran really could build more efficient code than C, then they would be the languages of choice for highly-constrained applications like embedded systems, and last I checked, they either used C or (like for aircraft systems) specialized languages for the specific field. Fortran and Pascal may have been better in the past (because they were more restricted), but the real world intrudes.

                        1. Richard Plinston

                          Re: Bounds checking for C and C++

                          > they would be the languages of choice for highly-constrained applications

                          FORTRAN _is_ the language of choice for certain applications. Not for embedded though, mainly because it doesn't cope with low-stuff well.

                          I already have indicated areas where other languages may be able to outperform C. For example multidimensional array numerical processing can be faster in FORTRAN because these are direct language features and the compiler can implement the code inline, while C would use a library (such as matrix) which would introduce call overheads, plus it may implement this as arrays of pointers to arrays of pointers to ... which could lead to running out of registers.

                          If you rewrite FORTRAN programs in C they may run slower. If you rewrite C programs in FORTRAN they probably will run slower. Use appropriate languages for the problem domain.

                        2. Richard Plinston

                          Re: Bounds checking for C and C++

                          > Guess what? Those are real-life concerns.

                          But are entirely spurious in the discussion about whether a language can or will do bound checking.

        2. Dan 55 Silver badge

          Re: Bounds checking for C and C++

          I think the problem is that the C standard doesn't allow for bounds-checking, so anything of the sort can only be inserted unofficially.

          That's why we have compiler switches, isn't it?

          GCC's C is hardly standard either.

      2. martinusher Silver badge

        Re: Bounds checking for C and C++

        One thing a lot of applications programmers get confused about is the difference between a compiler like gcc and its run time libraries. Its easy to do this because many applications languages don't distinguish between the two but with something relatively low level like 'C' the language compiler and the run-time support are two very different things. You can actually write 'C' code that doesn't use standard libraries or indeed any libraries at all. If you don't like a particular language library feature then you just write something more to your taste.

        For something like array bounds checking you need to make the data 'active' -- self-aware, You can write 'C' to do this but defining a C++ class is a much neater way to write the code.

        (While we're thinking bounds checking and the like Microsoft's various string handling mechanisms could be regarded as Exhibit 'A' as to why I don't like C++....its not the language, its the way it gets used. Imagine what happens when you use this mindset write embedded code.......nasty.......)

    2. Robert E A Harvey

      Re: Bounds checking for C and C++

      Yes I was going to say "surely been done before"

    3. energystar
      Coffee/keyboard

      Re: Bounds checking for C and C++ Oops!

      Didn't Know... Thanks, Walter.

  3. bazza Silver badge

    ASN.1 and PADS

    For network services written in C there's always been the risk of a buffer overrun and other problems when reading protocol data from a network connection. It's entirely avoidable, but avoiding it requires people to thoroughly review their work, etc. For various reasons that is never as exhaustive as it should be.

    However there's things out there that, if used, help, I'm thinking of ASN.1, a serialisation standard that does bounds and value checking for you, in ordinary C (or whatever language you choose actually).

    Its brilliant, with one caveat - when the tools are implemented properly...

    Using that to define and implement a public interface is far better than how most network service protocols are specified. ASN.1 as a protocol specification and implementation tool makes the RFCs behind a lot of the Internet's protocols look like the work of school children.

    It's madness to define a binary protocol in English (which is what most RFCs do) and expect implementers to read it and get it right in their coding. It takes a long time and is inevitably error prone.

    But it's too late, the protocols exist and cannot be changed.

    PADS

    But there is one option. It's the PADS project, at http://www.padsproj.org. This allows you to specify any arbitrary binary data stream in a schema (just like ASN.1 does) and automatically generated C code to interpret and generate it (again, just like ASN.1).

    Automatic generation = less review effort required. That's the benefit.

    Thus you could specify, for example, the SSH protocol as a PADS schema and then automatically generate the protocol driver for SSH.

    PADS would probably need a bunch of work done on it to make this complete (like adding streaming support and bounds and value checking if it hasn't already got it). But if done we could then ditch a lot of RFCs and replace them with formal schemas instead.

    I'm not connected to the project, it just looks neat and would allow for protocol formalities to be automatic, not hand implemented

    1. Charles 9

      Re: ASN.1 and PADS

      What about the necessary drawback of speed, especially when you get to higher network speeds with less time to get things done?

      1. bazza Silver badge

        Re: ASN.1 and PADS

        I don't really see a big difference between a bunch of automatically generated code shovelling bits and bytes around and a bunch of hand written code doing the same thing as prescribed by some RFCs.

        I would contend that a "hand written" protocol may be fast but only because of it not doing value and bounds checking.

        Having used a good commercial ASN.1 implementation with good support and looked at its generated parser code it can be pretty lean, especially considering that it also deals with all the problems of streaming connections too.

        The operative word there is 'commercial'. People generally don't want to pay for good tools, and when they use an incomplete or buggy toolset (perhaps and open source one) ASN.1 in general gets a bad name.

        I don't know about PADS - I keep meaning to try it - but it's probably worth a go.

        I don't sell ASN.1 tools either!

        Google Protocol Buffer

        To be honest, if Google added streaming support and bounds / value checking to protocol buffers, that'd be a good candidate for wiping out ASN.1 completely. Every time Google update GPB it looks more and more like ASN.1...

    2. Ken Hagan Gold badge

      Re: ASN.1 and PADS

      The experience with parser generators in the 60s/70s was that languages that were originally designed in the "hand-crafted era" were a real bitch to write a grammar for and the real power and convenience of these tools was only seen with languages where the convenience of the grammar was influential in the language design. I imagine you'd see something similar with PADS, so you'll find that most of your existing protocols are a nightmare to specify.

      But interesting, nonetheless. In the long run, these more declarative approaches to programming are usually far less buggy, far easier to write in the first place, and amenable to formal analysis in the long run. (I wonder how many of the security holes found in SSH over the years could actually have been found by an automated tool if you could have described the protocol to it.)

      1. bazza Silver badge

        Re: ASN.1 and PADS

        (I wonder how many of the security holes found in SSH over the years could actually have been found by an automated tool if you could have described the protocol to it.)

        Well I guess in the perfect world, all holes would be found by such a tool :)

        One barrier to improving the current situation is that there's loads of competing standards (gpb, thrift, asn1, xml, etc etc etc etc), all with their own qualities and purpose. None of them fit all needs.

        It would be kinda nice if someone did one properly! In principle I like ASN.1 because at least it does do bounds and value checking, which is more than most of the others.

    3. energystar
      Windows

      "thoroughly review their work, etc. "

      ...For various reasons that is never as exhaustive as it should be."

      Well, SURELY most of those are not IT reasons... Is it right to just shut up and 'let it be'? Should We be surprised of people looking at Us wen problems arising?

    4. oldcoder

      Re: ASN.1 and PADS

      Assuming the code generated is valid...

      That is part of the problem with ASN.1. Not all the code being automatically generated is quite valid, and without checking bounds against the system determined packet size, you STILL get out of bounds references.

      And checking against the system determined size is NOT common, as not all systems can provide that information...

    5. patrickstar

      Re: ASN.1 and PADS

      ASN.1 implementations have been a major source of vulnerabilities in just about anything that has had one written in a memory-unsafe language.

      So, no.

  4. Ken Moorhouse Silver badge

    The performance game...

    The performance game was arguably a big thing for MS in the early days - knocking perfectly sound applications out of the running by cutting corners. Now they want to preach to us how bad buffer overflows are?

  5. Anonymous Coward
    Anonymous Coward

    Never understood why data bounds checking has not been implemented in hardware. ICL's VME O/S architecture had Data Descriptors which apparently policed the access to the relevant memory for the S3 language.

    https://www.fujitsu.com/uk/Images/the-architecture-of-open-vme.pdf

    1. Charles 9

      The same reason C doesn't do it in software: there's a price to pay, and particularly in hardware, speed trumps security. What good is a secure job if it doesn't make the deadline?

      1. John Sanders
        Holmes

        You can have both

        Security and speed, C more than anything else makes it possible, If you're good that is.

        I'm not claiming I'm good, but I know people who are and they produce incredible code, robust and fast, believe it or not there are several projects out there with incredible good tracks of security and they all use C.

        The issue is that while you can learn C in 21 days, you need 10 years to be a mediocre developer.

        And this doesn't help an industry that loves to sell the next best fad again and again, as quickly as possible.

      2. energystar
        Linux

        if it doesn't make the deadline?

        Dead line is what doctors have at a screen if critical software fails...

        1. oldcoder

          Re: if it doesn't make the deadline?

          Or in the case of MS software --- when a virus scan occurs and breaks the application mid surgery. (http://news.softpedia.com/news/medical-equipment-crashes-during-heart-procedure-because-of-antivirus-scan-503642.shtml)

    2. oldcoder

      There are too many ways to to overflow a buffer...

      Data descriptors are just a pointer to a complex pointer... and are slow.

  6. Anonymous Coward
    Trollface

    This being Micro$hit...

    How long till they find the first bug in the bug checker??

    1. Bronek Kozicki
      Pint

      Re: This being Micro$hit...

      This is open source under MIT license - which means it is not much relevant. Yes someone will find bugs, you have a guarantee for this. Someone will fix these bugs. Perhaps someone will fork the whole project in order to fix the bugs differently, or for any other reason. Not relevant, because you can take existence of bugs for granted. What's relevant is that there is an effort for standarized (i.e. portable) way to beat bugs in other C programs. I would drink to that.

  7. dajames

    ... pointer errors provide lots of dangerous vectors.

    I see what you did, there!

  8. Marco van de Voort

    So...

    basically they took 30 years to redo pascal but then with curly braces?

  9. Anonymous Coward
    Anonymous Coward

    For everyone that sees an article about C and takes it as an chance to take a dig at C please do yourself a favour and google valgrind.

  10. Joerg

    "In early 2002, Bill Gates' famous "battleship-turning" memo made cybersecurity a top goal for Microsoft. About a year later, Microsoft proposed a new "bounds-checking" library to WG14, which eventually became Technical Report 24731-1. It now is part of C11 as the (optional) Annex K."

  11. Anonymous Coward
    Anonymous Coward

    Really?

    We're supposed to trust the folks who gave us Outlook, IE, and IIS to write a code checking tool for C? Why does my mind boggle at the thought of that?

    1. energystar
      Angel

      "the folks who gave us Outlook, IE,.."

      and ISS.

    2. Sandtitz Silver badge
      Mushroom

      Re: Really?

      All you Anonymous Cowards are free to find bugs and data tracking code since this MS contribution is open source.

      1. Anonymous Coward
        Anonymous Coward

        Re: Really?

        Well, considering that for YEARS it seemed that Microsoft's method of finding buffer overflows was:

        1. Write code

        2. Release code

        3. Wait for buffer overflow reports to show up in Mitre, Bugtraq, etc

        That doesn't instill a lot of faith that they actually know how to do what this new tool is aimed at doing.

  12. Rich 2 Silver badge

    Oh no

    Oh great. MS fixing something that doesn't need fixing.

    Personally I have never seen the point of adding buffer overrun errors into C. They cause nothing but grief. I stopped doing it 30 years ago.

    The only excuse for it is incompetence. In which case, well, what do you expect?

    1. energystar
      Headmaster

      "what do you expect?"

      For the bad, lot of junior programmers are still assigned C production. Shield them a little, at least.

  13. energystar
    Boffin

    Let it flow, finally [everybody knows...]

    Not So Far Future is non x86 related. We're just trying to rescue programming Legacy, not the code.

    1. Anonymous Coward
      Anonymous Coward

      Re: Let it flow, finally [everybody knows...]

      Yeah. Right. Call me when an ARM system can do Crysis.

      1. energystar
        Angel

        can do Crysis.

        I'll do call you, then. But will be irrelevant, as better archs will be around.

  14. Mike 16

    C and Bounds checking.

    I was waiting for someone to challenge the assertion that "C doesn't allow bounds checking". It appears that I must be the one. The C language spec doesn't disallow bounds checking. The language around pointers (at least up to C89. Nobody has paid me to care about later versions) is pretty precise about what pointers are, and what operations are permitted on them. The distinction between data pointers and function pointers even lured IBM into using a quite nice implementation of function pointer in the original PowerPC ABI.

    Then reality reared it's ugly head. There is a vast swamp of legacy code extant that practices "unwarranted chumminess with the compiler", written by people who "know" that a pointer is always and everywhere _exactly_ what their first C compiler used in its implementation. Most likely an index into a sea of Octets or a pair of 16-bit words that smell a lot like an x86 (x<2) (segment,offset) pair.

    When we still have code that does AND, OR, and shifts on an "integer" that is then cast to a "pointer", or compilers that "believe" that the result of a pointer cast is a modifiable lvalue (looking at _you_ gcc), or programmers that depend on function pointers that are not only the same size as a data pointer, but the exactly same bit layout, everywhere, the C standard is clearly not being treated as such, but "more like guidelines, really". Compiler vendors that tried to sell compilers that met the standard and allowed safer coding lost out big time in the market.

    I'm not saying that C is a "safe language" for Joe Sixpack to be writing nuclear power-plant control code in. But to say the standard forbids making it a bit safer is "reality challenged".

    1. energystar
      Paris Hilton

      "compilers that met the standard and allowed safer coding lost out big time in the market"

      "Compiler vendors that tried to sell compilers that met the standard and allowed safer coding lost out big time in the market."

      Madness to no avail. In retrospective, what was that race to the last drop of performance at everything else cost? Is some sort of 'gamer' psychology involved? Were consumer 'wet dreams' and insane dismissals the ultimate goals?

      "But to say the standard forbids making it a bit safer is 'reality challenged'". Does anyone remains, to be asked about this?

      1. energystar
        Childcatcher

        Is some sort of 'GAMBLER' psychology involved?

        After all they're THE ONES, -the winners, the survivors- making the long due addendum.

        After that war, is GPU coding where the battle line is now, as least for 'last drop' of performance, gaming. So, can we proceed?

  15. Anonymous Coward
    Anonymous Coward

    Haven't we been here before?

    Flame me now, I'm not a C guru. But how is what Microsoft are announcing today different from all the other attempts to make C safer? StackGuard, ProPolice, W^X, etc etc. Not meaning to bash MSFT, just not clear what is really "new" here.

    1. Vic

      Re: Haven't we been here before?

      just not clear what is really "new" here

      It's got a new hat </smithers>

      Vic.

  16. david 12 Silver badge

    >handling pointers directly makes for efficient, “close to the hardware” programming>

    >handling pointers directly makes for efficient, “close to the hardware” programming<

    Why do people write garbage like that? Why do people repeat garbage like that? Isn't anything useful taught in Comp Sci?

    C was an efficient, "close to the hardware" programming language when compared to Scheme, Lisp.

    It was an inefficient, slow, bloated, language compared to languages designed for efficiency like FORTRAN and Pascal.

    This isn't "despite" the lack of language/compiler support for managing pointers: it's BECAUSE C lacks language/compiler support for handling pointers.

    There has, in fact, been significant changes in pointer handling between "ANSI C" and C11, specifically intended to make possible to write code as fast and efficient as Pascal or Fortran code, by (incompletely) inferring the pointer target from the pointer type: something fast and efficient languages were able to do by correctly and completely by design.

    PS: Dynamic bounds checking is something you do only where your language can't handle static bounds checking. Yes dynamic bounds checking is ineficient: that is a reason why languages that support static bounds checking are faster and more efficient for the same level of safety.

    1. Anonymous Coward
      Anonymous Coward

      Re: >handling pointers directly makes for efficient, “close to the hardware” programming>

      "PS: Dynamic bounds checking is something you do only where your language can't handle static bounds checking. Yes dynamic bounds checking is ineficient: that is a reason why languages that support static bounds checking are faster and more efficient for the same level of safety."

      Then how does static bounds checking deal with runtime-created data whose bounds aren't known at compile time and may not even be known at runtime?

      1. david 12 Silver badge

        Re: >handling pointers directly makes for efficient, “close to the hardware” programming>

        Yes, standard c is not actually Turing Complete: the behaviour is undefined when it runs out of stack space.

        But dynamic allocation is not the same as runtime-created data. And most C programs have lots of dynamic allocation which is statically determined. C makes it difficult to check: other languages have it built into the basic language design, and are faster and more efficient for the same level of safety.

        1. Charles 9

          Re: >handling pointers directly makes for efficient, “close to the hardware” programming>

          But what about dynamic allocation of data that is dynamically determined (example: an operation on a raw stream, like how gzip works). Now you have no idea how much data you're going to get, and the other end probably doesn't know either. Better-structured languages are more efficient, yes, when the data is more-structured, but then the real world intrudes and you have to handle data that may have no rhyme, reason, or even end.

    2. Charles 9

      Re: >handling pointers directly makes for efficient, “close to the hardware” programming>

      "It was an inefficient, slow, bloated, language compared to languages designed for efficiency like FORTRAN and Pascal."

      HOW can a language be more efficient that one that's close to the metal like C. Close to the metal means more like Assembler which is more like machine code, and raw machine code is about as efficient as you can get as you're talking the CPU's language, NOT yours.

      1. Richard Plinston

        Re: >handling pointers directly makes for efficient, “close to the hardware” programming>

        > HOW can a language be more efficient that one that's close to the metal like C. Close to the metal means more like Assembler which is more like machine code,

        A language implementation can have specific features which are faster than the equivalent provided by a C implementation. For example, C's string handling can be inefficient due to the need to search along the string until the null terminator is found while Pascal implementations often use a length count at the start of the char array.

        FORTRAN uses array descriptor blocks to cater for multidimensional arrays and the compilers and libraries handling these have been written in assembler. C multidimensional arrays could be done using arrays of pointers to arrays of pointers or by writing routines to calculate the array access but doing either in C will be less efficient than those in the best FORTRAN.

        I rewrote a C program in Python and had it running about 10 times faster. It was almost entirely string handling - it was processing a postscript template file substituting data into embedded tags to produce invoices and statements (and others) that were then converted to PDFs.

        1. Charles 9

          Re: >handling pointers directly makes for efficient, “close to the hardware” programming>

          All well and good when your data is well-structured. But what happens when you have to deal with UNstructured data, like a live stream? This is an example of the kind of stuff where you can't know ahead of time how much data you're gonna get, because often the other side doesn't know, either (usually because it's being generated on the fly, a la stream compression/encryption).

          1. Ken Moorhouse Silver badge

            Re: what happens when you have to deal with UNstructured data

            A guess is taken (maybe configurable by the user) as to what starting resources to allocate to the process. If the process exceeds these resources, either an error is flagged and the process halted or an attempt is made to allocate more resources. If more resources are not successfully acquired then an error is flagged and the process halted.

            1. Charles 9

              Re: what happens when you have to deal with UNstructured data

              "A guess is taken (maybe configurable by the user) as to what starting resources to allocate to the process."

              The trouble with guesses (especially wild ones, which can be the case here as something like a raw stream is pretty much a shot in the dark) is that they tend to miss more often than hit. And I wouldn't want to be the one fielding the calls for when the process keeps aborting half the time and it hogs the memory the other half.

          2. Richard Plinston

            Re: >handling pointers directly makes for efficient, “close to the hardware” programming>

            > But what happens when you have to deal with UNstructured data, like a live stream?

            Then you choose a language and/or system that is appropriate for that particular problem domain.

            1. Charles 9

              Re: >handling pointers directly makes for efficient, “close to the hardware” programming>

              "Then you choose a language and/or system that is appropriate for that particular problem domain."

              Can you NAME a system or language that's specifically designed to handle arbitrary amounts of unstructured data?

  17. LionelB Silver badge

    It's an attempt to resolve a paradox in C: on the one hand, handling pointers directly makes for efficient, “close to the hardware” programming; on the other, pointer errors provide lots of dangerous vectors.

    That's not a "paradox", it's a conundrum (@Alanis - nor is it ironic).

    </pedantic>

    1. Charles 9

      Actually, I wonder if what you're REALLY really looking for is "dilemma", as in a choice between two things, both of them bad. As in not using pointers is too slow, but using them is too risky. You lose either way.

POST COMMENT House rules

Not a member of The Register? Create a new account here.

  • Enter your comment

  • Add an icon

Anonymous cowards cannot choose their icon

Other stories you might like