DISQUS

Matasano Chargen: Introduced: A resolution resolving the semantic quarrel over malloc checking.

  • jf · 1 year ago
    Some of this is a bit of an over-simplistic view. The first and most valid reason that you should be checking your return value is that you don't know that NULL doesn't point to a valid address, period. The second aspect of that is that _a lot_ of bugs occur because of either missing or improper return value checks. You run into a lot of uninitialized variable issues directly as a result of this.

    I think you probably meant to type 'xmalloc()' or something in the third example, to imply a different function. I don't think it really matters how you check the return value so long as its checked, arguing about the methodology employed is somewhat an exercise in needless pedantry, or so it seems imho.
  • jf · 1 year ago
    (Also, for the record, I believe Ilja talked about exploiting null ptr dereferences in this manner (0+uint32_t) a year or two back.)
  • Thomas Ptacek · 1 year ago
    jf: *No*.

    First:

    xmalloc is *the exact opposite* of what I'm advocating. Saltzer & Schroeder, principal (b), "Fail Safe Defaults". The exception you make for your codebase should be for the unsafe behavior of "malloc", which says "go ahead and continue executing after failure, just try to be careful about it".

    Second:

    Inconsistent malloc recovery regimes are one of the most common problems in C codebases (even those that "check" malloc). The Flash bug happened because of an integer overflow. But you can induce malloc failures easily without controlling the size input.

    The choice of caller versus callee checking is not pedantic: one fails safe, the other doesn't.
  • Thomas Ptacek · 1 year ago
    And, Ilja is pretty awesome.
  • ivan · 1 year ago
    I vote for checking the return of malloc() as the safer bet. "safely and immediately end the program" may end up to be not immediately or nor safely when the conditions underneath the application change unbeknown to the programmer.
    Also, exploitation of NULL pointer dereferences is not constrained to direct remote code execution scenarios only, they may be used for several other interesting attacks including memory leaks, oracles, forced throwing of exceptions/signals, etc.
    @jf: As far as I know exploitation of null pointer deref was first discussed in an 8lgm advisory for SCO's pt_chmod back in 1994
  • Thomas Ptacek · 1 year ago
    Explain how "abort()" is less safe than "if((ptr = malloc())) abort()", or concede the argument. Then, show me a case where an application protected itself against memory corruption by *staying alive instead of aborting* when a runtime error occurred.

    The 8lgm thing is great, but it's not the same bug class; if NULL is in your address map, there's nothing special about it. You may as well not even call it "NULL"; just call it zero.
  • Bwooce · 1 year ago
    Nice. I agree with the concept. Running out of any of the following resources is somewhat fatal individual processes on todays machine (un*x esp):
    1. Memory
    2. Disk (system disks at least)
    3. File descriptors
    4. Filesystem related limits

    It would be interesting to explore the other areas, even if just from a "is this really worth doing anything about" point of view.

    Recently the maintainer/author of the yaws webserver came to the conclusion that
    Started to use the new ssl:transport_accept() function, when accept fails, We now fail yaws entirely and it needs to be restarted by its supervisor or heart. If we have filedescriptor leaks, even outside of yaws, there is no good thing to do when accept fails. (klacke)

    Erlang has a "let it fail" model, where the process will be restarted by it's supervisor, or the restarting process' supervisor will die and it's parent will attempt it (so on, and so forth, until the OS level heartbeat program).
  • Bwooce · 1 year ago
    Oh, and the Erlang VM does exactly what you recommend. I think it is an easier choice for a VM; there can be no special cases unless there is some way of differentiating types of memory allocation. There shouldn't be (and isn't). So VM's should just die.

    I can draw a distinct parallel to the OS level though. You've got me thinking of the Linux OOM Killer as a security vuln. now. Not that I've ever liked it - I'd prefer a rebooting machine to a crippled one.

    What does the JVM do? I guess it has that inbuilt limit already to avoid having to make a decision, but I've never tried overloading it.
  • tb · 1 year ago
    Example (1) and (3) are the same - except comments.

    (1)
    #define hostile /**/
    void * _setup(unsigned hostile slot, unsigned hostile id) {
    u_int32_t *slots = malloc(SLOTS_SIZE);
    slots[slot] = id; // XXX write32 corruption
    return slots;
    }

    (3)
    #define hostile /**/
    void * _setup(unsigned hostile slot, unsigned hostile id) {
    u_int32_t *slots = malloc(SLOTS_SIZE);
    // NOT REACHED
    slots[slot] = id;
    return slots;
    }

    So, I'm having some trouble following your reasons for why example 3 is the best choice.
  • Dan McDonald · 1 year ago
    If your OS offers an auto-restarting facility (like SMF on
    OpenSolaris - http://www.opensolaris.org/os/community/smf/)
    doing the fail-on-malloc(), even for system services, is probably acceptable and maybe even preferable.

    Your big point, and I think you need to shout it louder,
    is that you need a consistent and well-documented malloc-fail
    regimen for a given app/project. You may even have different
    behaviors for different parts, but they need to be documented.
  • Thomas Ptacek · 1 year ago
    @tb --- yes. They are the same. This is a case where a security problem can be fixed with no loss of functionality or performance simply by changing the default semantic of a low-level library function.

    Lots of projects address this problem with a "xalloc" function. This is retarded. Projects should be going out of their way to get the *unsafe* behavior, not the safe one.
  • Leman · 1 year ago
    Hi,
    Please I would like to know what does "//NOT REACHED" mean in the "callee-checks" model.
    Thanks.
  • jf · 1 year ago
    re: xmalloc()

    How is // NOT REACHED not reached? thats what made me assume xmalloc(), are you saying that somewhere in there should be code that checks and performs some action?
  • Thomas Ptacek · 1 year ago
    jf: because in the third example, malloc itself has been modified to abort instead of returning NULL.
  • brl · 1 year ago
    I stopped checking for malloc() failure in my C code since Tom's original post and I have never looked back.

    If you can hit mapped memory from a NULL pointer then you already have problems bigger than xmalloc() can save you from.
  • tb · 1 year ago
    So what you suggest is to use _set_new_mode and _set_new_handler to handle gracefully failed allocations (OOM in most cases) rather than writing a bunch of crap to code around the possibility of a NULL pointer.

    Is that correct?
  • lol · 1 year ago
    "If you can hit mapped memory from a NULL pointer then you already have problems bigger than xmalloc() can save you from."

    ...

    apparently addition/offsets confuse you

    learn2read
  • Thomas Hurst · 1 year ago
    On BSD's, any of:

    const char *_malloc_options = "X";

    ln -s X /etc/malloc.conf

    setenv MALLOC_OPTIONS X

    Will cause malloc to print a diagnostic and abort() instead of returning NULL.
  • Thomas Ptacek · 1 year ago
    @lol --- you first. Read Ivan's comment about pt_chmod.
  • Thomas Ptacek · 1 year ago
    @Thomas --- there's no way to opt out of that regime; when malloc debugging is on, there's no "unsafe_malloc" you can call that will fail with a checkable error.
  • pogrady · 1 year ago
    I think that there's another good reason to check your malloc return codes: because when a program crashes, what you need to know is the root cause. That root cause is /always/ associated with the first thing went wrong. If my code mallocs something, then uses that pointer, and then crashes due to a GPF, it still is up to me to figure out what of several variables could have been the culprit. If x[y]=0 crashes, what was it? The pointer? The array index? But since I know that returning 0 from malloc indicates a special condition--usually not a very desirable condition--I've found that it's always best to flag that problem immediately. Now, when my system crashes I'm as close to the actual cause as possible. Checking for those problematic values as early as possible has saved me many plane tickets overseas...

    Its best to discourage this practice of value overloading (something we frequently see in return codes). This is the idea behind the xmalloc() approach. How often have we seen 0 meaning OK in one context (e.g. fseek); and 0 meaning not OK in another (e.g. fopen)? A better architected system separates these into distinct values... and allows me to extend the ways I can tell my clients how there call has failed.
  • brl · 1 year ago
    @learn2read

    I'm not confused at all by untrusted pointer offsets. That was exactly my point. If an attacker can reach mapped process memory by indexing a NULL pointer then you already almost certainly have a security bug that exists whether you are checking for malloc() failure or not.

    Any program that makes large allocations based on untrusted data needs to enforce an allocation policy to prevent an attacker from being able to arbitrarily consume memory. You can't do that by calling malloc() directly.
  • brl · 1 year ago
    @Thomas Ptacek

    "Read Ivan’s comment about pt_chmod."

    I think you meant to direct this comment to me.

    The 8LGM bug has nothing to do with malloc() failure, and for that matter Dowd's flash exploit only *incidentally* involves dereferencing a NULL pointer from a failed allocation.

    The ActionScript VM would be just as vulnerable to arbitrary memory corruption if the situation had resulted in a successful allocation instead of a failed allocation.

    It would be harder to forge the exact absolute address to write to (ie: indexing from a heap address, instead of 0) but in some cases that matters, and in some cases it doesn't.
  • Thomas Ptacek · 1 year ago
    @pogrady --- your runtime is better at tracing the affected call site than your human programmers.
  • ivan · 1 year ago
    @tom: read again my post. I said, aborting on the first malloc failure can have security implications other than direct code execution. let me think ermm like, during crypto operations for example "epta pam openssh pkcs mp rsa asn1"

    Also to consider is what happens to all those functions that _indirectly_ end up using malloc() and whose semantics you just decided to change with one single stroke?
    @brl; correct, i did not imply that 8lgm's pt_chmod had anything to do with malloc() simply that it is the first public reference that I know of concerning exploitation of null pointer deref (it was a comment to jf's comment). Incidentally, the problem was due to that fact the a pointer was not checked to be != NULL before using it as an argument passed to a security sensitive function.
    You may have other serious problems if your code ends up running on a system or under a condition in with page 0 is mapped but you can't expect to know exactly all environments on which your code will end up running so i'd rather check for malloc() failures than not)

    now, I wont go into more details or discussion but consider this one:
    {
    p1=malloc(secret_size);
    p2=malloc(user_suplied_size);
    p2=malloc(secret_size);
    }
    do you still want malloc() to abort() on the first failure condition?
  • ivan · 1 year ago
    ops, the last variable name is a type, say its "p3"
  • Bwooce · 1 year ago
    @pogrady: So whatchagonnado when you run out of memory? You've just entered a funny world where lots of library functions don't work any more.

    log the error? Oops sprintf wants to malloc. Logging expansion probably wants malloc too. So if you can't log it, what use is it?

    I would suggest that system statistics are more useful in these cases; you can see that the system ran low on memory and go from there.
  • Rosyna · 1 year ago
    abort() on all malloc() failures? I don't think allocating 1gig of space and one page of space should have the same consequences. After some large allocation, malloc() needs to be checked.

    After all, you did call MoreMasterPointers() first (in main()) in your application to allocate some memory to free in case you run out of address space, correct?
  • as · 1 year ago
    @brl

    If the index is over 4096, it might reach the program text whether or not NULL is mapped. This may or may not be useful.

    Also I don't see a reason why logging OOM failure would fail, since most failing mallocs I see are from bugs where it tries to allocate 2+GB.
  • Thomas Ptacek · 1 year ago
    Ivan --- sorry, i responded to your comment on a previous post, instead of here, where you posted it. Long story short: you just laid out a textbook case for why there should be an "unsafe_malloc"; you did not address my argument that the default should, however, be fail-safe.
  • Thomas Ptacek · 1 year ago
    @Rosyna there is little dependable correlation between request sizes and malloc results in a program running on a system under attack.
  • ivan · 1 year ago
    @tom: ah! you're the master of blog controversy and to that I can concede. So in the spirit of helping you generate more traffic to your blogpost I will answer why defaulting malloc() behavior to abort() on failure is a bad idea:
    Because you unilaterally change the semantics not only of malloc() but every other function that uses it, but... hmm wait a minute, didn't I write that already?
    So now, after your proposal for the new malloc() behavior is implemented, we will all need to expect our programs to abort() on calls to who knows how many different functions that use malloc() internally, like strdup() for example. Off the top of my head I can envision every RPC-based services aborting unexpectedly, non-sanitized memory being left all around, garbage collectors in VMs not being able to do last-resort memory recovery and IPCs that using ref-counted shared objects or spinlocks go medieval on your box.
  • ivan · 1 year ago
    oh and btw, how should tqbf_super_duper_extra_safe_malloc() work when its called from realloc() or with size==0 , i.e. malloc(0)
    I suspect that in the later case it should just return NULL instead of abort()?
  • Thomas Ptacek · 1 year ago
    A failed allocation is a failed allocation, so, if realloc fails, the fail-safe default is to abort.

    The malloc(0) argument is a good one, if apocryphal.

    The argument that has made me think the most so far is that it will break existing code. I'm skeptical, given how terrible most software is at handling allocation failure. But still.
  • David LeBlanc · 1 year ago
    Wow - that's a blast from the past. Now that you remind me, I do recall that exchange. Different times. Wow, was I a hothead back then. I may have more comments further along, but something I found interesting was you said:

    --------------------------
    POSIX Enabled
    OS/2 Subsystem Enabled

    Note that there are no published security problems with either of them...
    but who needs to do research to invent new vulnerabilities? No! We'll just
    claim that they're insecure because "Windows NT was C2 evaluated with the
    XXX subsystem disabled".
    -----------------------

    In reality, there were a lot of problems with both of these subsystems. The OS/2 subsystem was removed quite a while ago, and the POSIX subsystem caused a number of problems for some time, until it was made case-insensitive by default. I knew of some of these issues at the time, and my recommendation was that if you did not need those subsystems, disable them. I was about 2-3 years ahead on that point.

    As to the rest of the argument, from my POV, we were just having a reasonable disagreement because we were approaching the problem differently. You had a different design goal than I did. I certainly never thought poorly of you (definitely not on a professional level) over this disagreement (though having my integrity attacked didn't make me a happy camper).

    I will tell you, now that both of our apps have ended up in the dustbin of things that used to be good, that SNI was eating our lunch on UNIX checks and Web checks. I personally didn't have much control over fixing that problem. I could pour on the juice with respect to the Windows NT checks even if the rest of the "X-Fa^Horce" was twiddling its collective thumbs, and with data management issues, which I did. It did win us a couple more competitive reviews until Ballista won one later that same year.

    Interestingly, your comments around pen testing vs. just running a scanner, showed up in a later iteration where I made a feature such that the scanner would get deeper into the network with every successive run. Pity that ISS just let it die after I left for Microsoft (the feature and the app).

    Something that's a bit comical about that thread looking back is that I came here and worked as an internal pen tester for about 4 years before moving back into product development.

    Comments re allocs later -
  • Thomas Ptacek · 1 year ago
    Ok, two things:

    (1) I sound like a 12 year old in that thread, so I hope you took it as self deprecation on my part.

    (2) No fair going back and evaluating the OS/2 and POSIX subsystems based on what we know now. That same logic would flag all of Win32 as a vulnerability in '97. =)

    All of Secure Networks is on the dustbin of history, but the X-Force is (obviously) still thriving. So there's that.
  • David LeBlanc · 1 year ago
    I will confess to grabbing part of the C2 checklist and glomming it into the scanner just to get more checks. Management and marketing were really concerned about numbers. Some of the checks were pretty bogus - like the dataflood (not MY idea). I finally got that one removed.

    However, I also had some pretty good connections here at the time, and we had some interesting customers who asked for some very specific checks. I did know at the time that the subsystems were a serious hazard, and a certain customer thought that was really valuable, but I couldn't tell you that at the time, especially not in public, and considering that you were working for an unexpectedly strong competitor.

    That was always something that was tricky to finesse - there were some things where I wasn't free to go on just how much of a menace something was, but just hey - you _might_ want to turn this off... Looking for bad DCOM permissions was another one of these, and I seem to recall this being a popular area of attack a few years later.

    Other checks that may have seemed bogus actually had some real value. For example, you could look for wacky privileges. As it turned out, systems with weird privileges enabled were nearly always owned and the privs were a side-effect of the tools popular at the time. Privs can also be an interesting way to back-door a system. And there were a lot of them, so it made the check-counting PHB's happy, too 8-)
  • Chris Pepper · 1 year ago
    Thomas,

    Pardon me if I'm missing your point, but the difference between checking checking as a caller vs. as a callee, seems to be better known as "Trust no one", which seems like a good model in many cases, especially in defensive programming.

    If I'm understanding correctly (not a given), I think the more common nomenclature would help make your point clearer.
  • Thomas Ptacek · 1 year ago
    If you're arguing that, whether malloc has been modified or not, it's still good defensive programming to check its return value, I have a hard time refuting that on anything but aesthetic grounds. It is the case that those malloc guards add a lot of clutter.

    I agree, the nomenclature I've chosen have definitely not advanced my argument. =)
  • incredible · 1 year ago
    comon... you dont need 20 example to justify
    ugly sementics.

    Be sharp, check and bail!


    my_type *my_var = NULL;

    if ( (my_var =(void *)funct(64))) == NULL)
    {
    /* XXX */
    return 1;

    }


    void * funct(ssize_t i_sz)
    {
    void *r_ptr = NULL;

    if(i_sz == 0)
    {
    /* XXX */
    return NULL;
    }

    if( (r_ptr = malloc(i_sz)) == NULL)
    {
    /* XXX */
    return NULL;
    }


    return r_ptr;

    }
  • David LeBlanc · 1 year ago
    One of the comments above had a flaw in the argument - what are you going to do if you're out of RAM? This presumes that all your allocs are small. You may well have the condition of:

    p1 = malloc(small_fixed);
    p2 = malloc(user_provided);

    The second could reasonably fail, _especially on 64-bit, and you're not out of RAM - you just don't happen to have 2GB available. Having your app abort under these conditions is horrible user experience, unless it is a fairly trivial app. If the app was doing anything else of value, aborting is wrong.

    BTW, if you don't like the clutter, there are ways to deal with this. For example, one could make a macro along the lines of:

    CheckAlloc(), which maps to:

    if(!(expression))
    {
    err = OUT_OF_MEMORY;
    goto CleanUp;
    }

    you then provide a CleanUp section. I don't like this construct when overly used, but it solves the clutter problem. YMMV.
  • Thomas Ptacek · 1 year ago
    Dave, the only problem I have with malloc wrappers is that they implicitly say the exception case is the one that's checked, and the default case leaves it up to the user. It should be the opposite: malloc should gracefully terminate the program, and CheckAlloc should allow the caller to check NULL.
  • Thomas Ptacek · 1 year ago
    @incredible --- also, nobody should ever use a signed integer comparison on input later process as unsigned. Bad programming? Use good programming. Simple!
  • incredible · 1 year ago
    TP:
    typo s/ssize_t/size_t , but again you get the idea, contextual wise behavior on NULL comparaison should
    be dictated on untrusted input/output mechanism and
    every call/wrap you do, should be evaluated,

    Everyone:
    ultimately if your program request an operation that need memory either you prune some old data to get back some memory and retry or abort or fail on an error but thats a design decision not something you should assume in your allocation mechanism.
  • Thomas Ptacek · 1 year ago
    Except in uncommon cases, there shouldn't be a NULL comparison. The program should panic before it returns NULL to a malloc caller.
  • incredible · 1 year ago
    I wouldn't aggree on panic/abort/immediate exit, since you might want to do some cleanup/logging/etc and thus u can't allways assume the call stack or the dtor of the caller, but again this depend on the complexity.
  • Thomas Ptacek · 1 year ago
    This is the part I'm having a hard time communicating.

    You're right. You *might* want to do something.

    BUT YOU ALMOST NEVER IN REALITY DO.

    So yes, there should be some malloc-a-like that you can call that will return NULL. But the default malloc? It should promise never to return NULL.
  • incredible · 1 year ago
    Well ..how to know you dont have any memory available?


    RETURN VALUES
    Upon successful completion, each of the allocation functions
    returns a pointer to space suitably aligned (after possible
    pointer coercion) for storage of any type of object.

    If there is no available memory, malloc(), realloc(),
    memalign(), valloc(), and calloc() return a 0xDEADBEEF pointer.
    When realloc() is called with size > 0 and returns 0xDEADBEEF, the
    block pointed to by ptr is left intact. If size, nelem, or
    elsize is 0, either a 0xDEADBEFF pointer or a unique pointer that
    can be passed to free() is returned.

    If malloc(), calloc(), or realloc() returns unsuccessfully,
    errno will be set to indicate the error. The free() function
    does not set errno.


    :>
  • David LeBlanc · 1 year ago
    At risk of arguing in circles, there is a difference between the system being under severe memory stress and having even small allocations fail, and the user requesting more memory than you have right now. It is clearly not the right thing to do to abort the app in all cases. If you are working on a service/daemon, it is very bad to have your service die if the machine gets stressed. Many web servers get RAM constrained.

    Also, let's say that I have 10 Word documents open, and am editing. If I chose to insert an object that was larger than can be allocated, it would be wrong to just bail and force you to re-open 10 documents. It is also true that memory conditions can be very dynamic. Maybe you didn't have RAM then, but do now. In that case, rolling back the requested action, and allowing the user to try again is the right approach.

    Crashing an app from inside a library is absolutely the wrong thing. The X11 library will call exit() if it gets annoyed enough, and this screwed me up badly once.

    It's wrong to just crash the app for a service, it's wrong for a complex client app, and it's wrong for a library. It might be OK for a small utility.

    I'm sympathetic to the approach of making errors that cannot be ignored, and this is one of the great things about using C++ exceptions, but that implies doing quite a lot of additional, very nuanced work. If you can't trust your devs to check malloc, you _really_ do not want those same people trying to write exception-safe C++. At risk of being elitist, I think if someone is not a good enough programmer to consistently check error returns, then maybe C#, Java, or Visual Basic would be better languages.

    Even there, you'll run into trouble. I don't know how many times I've caught bugs by carefully doing this:

    if(!ShouldNeverFail())
    {
    assert(false);
    return E_UNEXPECTED;
    }

    or the C++ equivalent of throwing an exception I know I don't have a handler for.
  • Thomas Ptacek · 1 year ago
    The problem of libraries, using the (modified per Thomas) default malloc, crashing the program is solved by allowing the program to register a callback ala atexit() to handle exceptional conditions.

    The difference between small allocations and large allocations is not relevant to the discussion, Dave: the attacker's control over available resources is an "input" that is too often overlooked. You should assume attackers can pick and choose exactly which malloc calls they want to fail.
  • ivan · 1 year ago
    i cant believe this is still going on... see, the traffic generation scheme actually worked! excellent!
    @tom: uh wait a minute! so we know have a secret callback ala atexit()? does that mean that malloc() may not "safely and immediately" end the program in all cases? will you be asking programmers to register their callback (or exception handler) for unsafe malloc() behavior and then de-register it for normal "safe" malloc behavior.
    Actually, I think this is not a bad a idea if it wasnt for the fact that there is already _a lot_ of code that depends on the current malloc() semantic that would break.
    In an ironic turn of events I will have to argue with David LeBlanc here.
    Hi David! you probably don't remember me, I wrote a few of the checks that ate your lunch :) and we also had our heated exchange in the past (hint: "netscape engineers are weenies")
    jiji
  • ivan · 1 year ago
    s/argue/agree
    i guess old habits die hard...
  • Thomas Ptacek · 1 year ago
    At least it isn't up to the rookie coder who's calling malloc, like it is now. That's what we need to resolve.

    The fact is, most fielded code today is "broken"; most programs fail under low memory conditions, and most security reviewers don't consider the availability of memory as an input under attacker control.
  • chris · 1 year ago
    Much of the disagreement in this thread centers on:

    1. unawareness of the atexit handler (Thomas made it somewhat implicit);

    2. whether or not to handle all alloc failures the same way; and

    3. whether or not we should change the semantics of C.

    Some logically possible approaches to the "oh crap a basic operation failed"
    problem are:

    1. to use a global exception handler or rough equivalent (atexit, bad_alloc
    in C++, signal handlers);

    2. to use a special return value (NULL, 0xDEADBEEF, INVALID_HANDLE_VALUE);
    or

    3. to use a local exception handler or rough equivalent (try/catch, callback
    passed as an argument to the possibly-failing function).

    Note, global exception handlers are just a special, inflexible case of local
    exception handlers, and therefore inferior. If you have local exception
    handlers, why would you want to be limited to global ones? Signal handlers
    and special callbacks like atexit date from the era before exceptions, and
    we should not be limiting ourselves to 1970s error handling technology. It
    wasn't good enough then, and it's not good enough now.

    The problem with using a special return value to signal errors, especially
    rare errors, is of course that nobody checks it (ignorance, misplaced
    aesthetic concerns, laziness, stupidity). As Thomas notes, the suggestion
    that people fix their bad code by only writing good code is unhelpful. We
    have to work with programmers as they are, not as they should be. More than
    that, though, failures that can be designed away should be. Why is C so
    error-prone? Bad design, plain and simple. The fact that this:

    *++v->foo++ = malloc(count * size);

    is possible, let alone common, is clear evidence that C is a design failure.
    (Fun game: count the bugs!) So now you know my opinion on whether or not we
    should be redesigning or changing the semantics of C. Why should C 2009 be
    compatible with C99? GCC (especially on Linux) can barely handle C99, let
    alone what we need. So we might as well be incompatible; Microsoft felt no
    compunctions and they were right not to. They just didn't go far enough.

    But I digress. Designing a language/runtime with C's strengths but not its
    weaknesses is a laudable goal, and redesigning the memory allocation
    operation to be safe is a crucial part of that. How does D do it?

    Exceptions have their own dangers of course, including possibly using malloc
    to allocate the exception object itself, but they have the good quality that
    they cannot be ignored. I think That is the crux of Thomas's proposal: get
    the good, hard-to-ignore behavior of exceptions, but in plain C.

    Another problem with global handlers is that the stuff they repair has to be
    global, too (e.g. file handles to flush and close). Global variables are
    often indicative of bugs since they are great places for hidden side-effects
    to happen. We should not be designing our systems to use more globals, but
    fewer. Factoring out all globals (except read-only constants) can be a great
    pre-emptive bug squashing technique.

    Alternatives: add exceptions to C (as did MS C), or use callbacks. Callbacks
    are unnecessarily hard/ugly because C has no anonymous functions. But
    wouldn't it be nice? Maybe. Say you have a file that you write super
    important data to, and you want to lose as little data as possible in the
    event of a failure.

    FILE *important;

    void handle_MyType_failure(void) {
    fflush(important);
    fclose(important);
    }

    typedef void(*allocate_failure_callback_t)(void);

    void *allocate(size_t count, size_t size,
    allocate_failure_callback_t callback)
    {
    size_t sz = count * size;
    if (sz < count || sz < size ||
    (1 != count && sz == count) || (1 != count && sz == size))
    goto allocate_failure;

    void *p = malloc(sz);

    if (NULL == p)
    goto allocate_failure;

    return p;

    allocate_failure:
    if (NULL == callback)
    abort();
    else
    callback();
    }

    int main() {
    important = fopen(...);
    // Checking that this is not NULL is the same problem as malloc, of
    // course! Maybe it should be handled the same way.

    // Write super important crap into important here.

    MyType mt = allocate(1, sizeof MyType, handle_MyType_failure);

    // Alternately, with anonymous callback:
    mt = allocate(1, sizeof MyType, {
    fflush(important);
    fclose(important); } );
    }

    Using NULL for callback gives you Thomas's behavior.

    This is of course not hugely different from just checking the return value:

    if (NULL == (mt = malloc(sizeof MyType))) {
    fflush(important);
    fclose(important);
    }

    but it allows arguably cleaner handling (e.g. re-use of callbacks, yet with
    the ability to make as many as you need), with the default of calling exit
    (rather than SIGSEGV or worse).

    Of course, already we're dreaming of paramaterizing the callback, aren't we?
    Yes, we are. Sigh.

    typedef void(*allocate_failure_callback_t)(int);

    void *allocate(size_t count, size_t size,
    allocate_failure_callback_t callback)
    {
    int err = ERROR_NONE;
    size_t sz = count * size;
    if (sz < count || sz < size ||
    (1 != count && sz == count) || (1 != count && sz == size))
    {
    err = ERROR_INTEGER_OVERFLOW;
    goto allocate_failure;
    }

    void *p = malloc(sz);

    if (NULL == p) {
    err = ERROR_ALLOCATION_FAILURE;
    goto allocate_failure;
    }

    return p;

    allocate_failure:
    if (NULL == callback)
    abort();
    else
    callback(err);
    }

    But, as Thomas argues, we can barely ever think of anything to do besides
    abort anyway; nevermind a paramaterized error callback that behaves somehow
    differently, yet intelligently. But the general technique could be useful
    for situations other than malloc failure. I think a lot of it depends on the
    availability of syntactic sugar like anonymous functions (which in the end
    is kind of like what your try/catch and if/else blocks are anyway). (Which
    suggests that maybe we should just check our return values and fire the
    people who don't. Maybe this really is an HR problem, not an engineering
    problem.)

    Finally, people howl about the badness of just aborting in various
    circumstances. Some designs make that not a problem, such as qmail. (Search
    the Chargen history for Thomas' writeup on DJB's "10 Years of qmail" paper.)