-
Website
http://www.matasano.com/log -
Original page
http://www.matasano.com/log/1041/introduced-a-resolution-resolving-the-semantic-quarrel-over-malloc-checking/ -
Subscribe
All Comments -
Community
-
Top Commenters
-
Press Controls
3 comments · 2 points
-
ChrisMtso
12 comments · 1 points
-
Eric Monti
11 comments · 1 points
-
StatlerAndWaldorf
12 comments · 3 points
-
Dave G.
7 comments · 1 points
-
-
Popular Threads
I think you probably meant to type 'xmalloc()' or something in the third example, to imply a different function. I don't think it really matters how you check the return value so long as its checked, arguing about the methodology employed is somewhat an exercise in needless pedantry, or so it seems imho.
First:
xmalloc is *the exact opposite* of what I'm advocating. Saltzer & Schroeder, principal (b), "Fail Safe Defaults". The exception you make for your codebase should be for the unsafe behavior of "malloc", which says "go ahead and continue executing after failure, just try to be careful about it".
Second:
Inconsistent malloc recovery regimes are one of the most common problems in C codebases (even those that "check" malloc). The Flash bug happened because of an integer overflow. But you can induce malloc failures easily without controlling the size input.
The choice of caller versus callee checking is not pedantic: one fails safe, the other doesn't.
Also, exploitation of NULL pointer dereferences is not constrained to direct remote code execution scenarios only, they may be used for several other interesting attacks including memory leaks, oracles, forced throwing of exceptions/signals, etc.
@jf: As far as I know exploitation of null pointer deref was first discussed in an 8lgm advisory for SCO's pt_chmod back in 1994
The 8lgm thing is great, but it's not the same bug class; if NULL is in your address map, there's nothing special about it. You may as well not even call it "NULL"; just call it zero.
1. Memory
2. Disk (system disks at least)
3. File descriptors
4. Filesystem related limits
It would be interesting to explore the other areas, even if just from a "is this really worth doing anything about" point of view.
Recently the maintainer/author of the yaws webserver came to the conclusion that
Started to use the new ssl:transport_accept() function, when accept fails, We now fail yaws entirely and it needs to be restarted by its supervisor or heart. If we have filedescriptor leaks, even outside of yaws, there is no good thing to do when accept fails. (klacke)
Erlang has a "let it fail" model, where the process will be restarted by it's supervisor, or the restarting process' supervisor will die and it's parent will attempt it (so on, and so forth, until the OS level heartbeat program).
I can draw a distinct parallel to the OS level though. You've got me thinking of the Linux OOM Killer as a security vuln. now. Not that I've ever liked it - I'd prefer a rebooting machine to a crippled one.
What does the JVM do? I guess it has that inbuilt limit already to avoid having to make a decision, but I've never tried overloading it.
(1)
#define hostile /**/
void * _setup(unsigned hostile slot, unsigned hostile id) {
u_int32_t *slots = malloc(SLOTS_SIZE);
slots[slot] = id; // XXX write32 corruption
return slots;
}
(3)
#define hostile /**/
void * _setup(unsigned hostile slot, unsigned hostile id) {
u_int32_t *slots = malloc(SLOTS_SIZE);
// NOT REACHED
slots[slot] = id;
return slots;
}
So, I'm having some trouble following your reasons for why example 3 is the best choice.
OpenSolaris - http://www.opensolaris.org/os/community/smf/)
doing the fail-on-malloc(), even for system services, is probably acceptable and maybe even preferable.
Your big point, and I think you need to shout it louder,
is that you need a consistent and well-documented malloc-fail
regimen for a given app/project. You may even have different
behaviors for different parts, but they need to be documented.
Lots of projects address this problem with a "xalloc" function. This is retarded. Projects should be going out of their way to get the *unsafe* behavior, not the safe one.
Please I would like to know what does "//NOT REACHED" mean in the "callee-checks" model.
Thanks.
How is // NOT REACHED not reached? thats what made me assume xmalloc(), are you saying that somewhere in there should be code that checks and performs some action?
If you can hit mapped memory from a NULL pointer then you already have problems bigger than xmalloc() can save you from.
Is that correct?
...
apparently addition/offsets confuse you
learn2read
const char *_malloc_options = "X";
ln -s X /etc/malloc.conf
setenv MALLOC_OPTIONS X
Will cause malloc to print a diagnostic and abort() instead of returning NULL.
Its best to discourage this practice of value overloading (something we frequently see in return codes). This is the idea behind the xmalloc() approach. How often have we seen 0 meaning OK in one context (e.g. fseek); and 0 meaning not OK in another (e.g. fopen)? A better architected system separates these into distinct values... and allows me to extend the ways I can tell my clients how there call has failed.
I'm not confused at all by untrusted pointer offsets. That was exactly my point. If an attacker can reach mapped process memory by indexing a NULL pointer then you already almost certainly have a security bug that exists whether you are checking for malloc() failure or not.
Any program that makes large allocations based on untrusted data needs to enforce an allocation policy to prevent an attacker from being able to arbitrarily consume memory. You can't do that by calling malloc() directly.
"Read Ivan’s comment about pt_chmod."
I think you meant to direct this comment to me.
The 8LGM bug has nothing to do with malloc() failure, and for that matter Dowd's flash exploit only *incidentally* involves dereferencing a NULL pointer from a failed allocation.
The ActionScript VM would be just as vulnerable to arbitrary memory corruption if the situation had resulted in a successful allocation instead of a failed allocation.
It would be harder to forge the exact absolute address to write to (ie: indexing from a heap address, instead of 0) but in some cases that matters, and in some cases it doesn't.
Also to consider is what happens to all those functions that _indirectly_ end up using malloc() and whose semantics you just decided to change with one single stroke?
@brl; correct, i did not imply that 8lgm's pt_chmod had anything to do with malloc() simply that it is the first public reference that I know of concerning exploitation of null pointer deref (it was a comment to jf's comment). Incidentally, the problem was due to that fact the a pointer was not checked to be != NULL before using it as an argument passed to a security sensitive function.
You may have other serious problems if your code ends up running on a system or under a condition in with page 0 is mapped but you can't expect to know exactly all environments on which your code will end up running so i'd rather check for malloc() failures than not)
now, I wont go into more details or discussion but consider this one:
{
p1=malloc(secret_size);
p2=malloc(user_suplied_size);
p2=malloc(secret_size);
}
do you still want malloc() to abort() on the first failure condition?
log the error? Oops sprintf wants to malloc. Logging expansion probably wants malloc too. So if you can't log it, what use is it?
I would suggest that system statistics are more useful in these cases; you can see that the system ran low on memory and go from there.
After all, you did call MoreMasterPointers() first (in main()) in your application to allocate some memory to free in case you run out of address space, correct?
If the index is over 4096, it might reach the program text whether or not NULL is mapped. This may or may not be useful.
Also I don't see a reason why logging OOM failure would fail, since most failing mallocs I see are from bugs where it tries to allocate 2+GB.
Because you unilaterally change the semantics not only of malloc() but every other function that uses it, but... hmm wait a minute, didn't I write that already?
So now, after your proposal for the new malloc() behavior is implemented, we will all need to expect our programs to abort() on calls to who knows how many different functions that use malloc() internally, like strdup() for example. Off the top of my head I can envision every RPC-based services aborting unexpectedly, non-sanitized memory being left all around, garbage collectors in VMs not being able to do last-resort memory recovery and IPCs that using ref-counted shared objects or spinlocks go medieval on your box.
I suspect that in the later case it should just return NULL instead of abort()?
The malloc(0) argument is a good one, if apocryphal.
The argument that has made me think the most so far is that it will break existing code. I'm skeptical, given how terrible most software is at handling allocation failure. But still.
--------------------------
POSIX Enabled
OS/2 Subsystem Enabled
Note that there are no published security problems with either of them...
but who needs to do research to invent new vulnerabilities? No! We'll just
claim that they're insecure because "Windows NT was C2 evaluated with the
XXX subsystem disabled".
-----------------------
In reality, there were a lot of problems with both of these subsystems. The OS/2 subsystem was removed quite a while ago, and the POSIX subsystem caused a number of problems for some time, until it was made case-insensitive by default. I knew of some of these issues at the time, and my recommendation was that if you did not need those subsystems, disable them. I was about 2-3 years ahead on that point.
As to the rest of the argument, from my POV, we were just having a reasonable disagreement because we were approaching the problem differently. You had a different design goal than I did. I certainly never thought poorly of you (definitely not on a professional level) over this disagreement (though having my integrity attacked didn't make me a happy camper).
I will tell you, now that both of our apps have ended up in the dustbin of things that used to be good, that SNI was eating our lunch on UNIX checks and Web checks. I personally didn't have much control over fixing that problem. I could pour on the juice with respect to the Windows NT checks even if the rest of the "X-Fa^Horce" was twiddling its collective thumbs, and with data management issues, which I did. It did win us a couple more competitive reviews until Ballista won one later that same year.
Interestingly, your comments around pen testing vs. just running a scanner, showed up in a later iteration where I made a feature such that the scanner would get deeper into the network with every successive run. Pity that ISS just let it die after I left for Microsoft (the feature and the app).
Something that's a bit comical about that thread looking back is that I came here and worked as an internal pen tester for about 4 years before moving back into product development.
Comments re allocs later -
(1) I sound like a 12 year old in that thread, so I hope you took it as self deprecation on my part.
(2) No fair going back and evaluating the OS/2 and POSIX subsystems based on what we know now. That same logic would flag all of Win32 as a vulnerability in '97. =)
All of Secure Networks is on the dustbin of history, but the X-Force is (obviously) still thriving. So there's that.
However, I also had some pretty good connections here at the time, and we had some interesting customers who asked for some very specific checks. I did know at the time that the subsystems were a serious hazard, and a certain customer thought that was really valuable, but I couldn't tell you that at the time, especially not in public, and considering that you were working for an unexpectedly strong competitor.
That was always something that was tricky to finesse - there were some things where I wasn't free to go on just how much of a menace something was, but just hey - you _might_ want to turn this off... Looking for bad DCOM permissions was another one of these, and I seem to recall this being a popular area of attack a few years later.
Other checks that may have seemed bogus actually had some real value. For example, you could look for wacky privileges. As it turned out, systems with weird privileges enabled were nearly always owned and the privs were a side-effect of the tools popular at the time. Privs can also be an interesting way to back-door a system. And there were a lot of them, so it made the check-counting PHB's happy, too 8-)
Pardon me if I'm missing your point, but the difference between checking checking as a caller vs. as a callee, seems to be better known as "Trust no one", which seems like a good model in many cases, especially in defensive programming.
If I'm understanding correctly (not a given), I think the more common nomenclature would help make your point clearer.
I agree, the nomenclature I've chosen have definitely not advanced my argument. =)
ugly sementics.
Be sharp, check and bail!
my_type *my_var = NULL;
if ( (my_var =(void *)funct(64))) == NULL)
{
/* XXX */
return 1;
}
void * funct(ssize_t i_sz)
{
void *r_ptr = NULL;
if(i_sz == 0)
{
/* XXX */
return NULL;
}
if( (r_ptr = malloc(i_sz)) == NULL)
{
/* XXX */
return NULL;
}
return r_ptr;
}
p1 = malloc(small_fixed);
p2 = malloc(user_provided);
The second could reasonably fail, _especially on 64-bit, and you're not out of RAM - you just don't happen to have 2GB available. Having your app abort under these conditions is horrible user experience, unless it is a fairly trivial app. If the app was doing anything else of value, aborting is wrong.
BTW, if you don't like the clutter, there are ways to deal with this. For example, one could make a macro along the lines of:
CheckAlloc(), which maps to:
if(!(expression))
{
err = OUT_OF_MEMORY;
goto CleanUp;
}
you then provide a CleanUp section. I don't like this construct when overly used, but it solves the clutter problem. YMMV.
typo s/ssize_t/size_t , but again you get the idea, contextual wise behavior on NULL comparaison should
be dictated on untrusted input/output mechanism and
every call/wrap you do, should be evaluated,
Everyone:
ultimately if your program request an operation that need memory either you prune some old data to get back some memory and retry or abort or fail on an error but thats a design decision not something you should assume in your allocation mechanism.
You're right. You *might* want to do something.
BUT YOU ALMOST NEVER IN REALITY DO.
So yes, there should be some malloc-a-like that you can call that will return NULL. But the default malloc? It should promise never to return NULL.
RETURN VALUES
Upon successful completion, each of the allocation functions
returns a pointer to space suitably aligned (after possible
pointer coercion) for storage of any type of object.
If there is no available memory, malloc(), realloc(),
memalign(), valloc(), and calloc() return a 0xDEADBEEF pointer.
When realloc() is called with size > 0 and returns 0xDEADBEEF, the
block pointed to by ptr is left intact. If size, nelem, or
elsize is 0, either a 0xDEADBEFF pointer or a unique pointer that
can be passed to free() is returned.
If malloc(), calloc(), or realloc() returns unsuccessfully,
errno will be set to indicate the error. The free() function
does not set errno.
:>
Also, let's say that I have 10 Word documents open, and am editing. If I chose to insert an object that was larger than can be allocated, it would be wrong to just bail and force you to re-open 10 documents. It is also true that memory conditions can be very dynamic. Maybe you didn't have RAM then, but do now. In that case, rolling back the requested action, and allowing the user to try again is the right approach.
Crashing an app from inside a library is absolutely the wrong thing. The X11 library will call exit() if it gets annoyed enough, and this screwed me up badly once.
It's wrong to just crash the app for a service, it's wrong for a complex client app, and it's wrong for a library. It might be OK for a small utility.
I'm sympathetic to the approach of making errors that cannot be ignored, and this is one of the great things about using C++ exceptions, but that implies doing quite a lot of additional, very nuanced work. If you can't trust your devs to check malloc, you _really_ do not want those same people trying to write exception-safe C++. At risk of being elitist, I think if someone is not a good enough programmer to consistently check error returns, then maybe C#, Java, or Visual Basic would be better languages.
Even there, you'll run into trouble. I don't know how many times I've caught bugs by carefully doing this:
if(!ShouldNeverFail())
{
assert(false);
return E_UNEXPECTED;
}
or the C++ equivalent of throwing an exception I know I don't have a handler for.
The difference between small allocations and large allocations is not relevant to the discussion, Dave: the attacker's control over available resources is an "input" that is too often overlooked. You should assume attackers can pick and choose exactly which malloc calls they want to fail.
@tom: uh wait a minute! so we know have a secret callback ala atexit()? does that mean that malloc() may not "safely and immediately" end the program in all cases? will you be asking programmers to register their callback (or exception handler) for unsafe malloc() behavior and then de-register it for normal "safe" malloc behavior.
Actually, I think this is not a bad a idea if it wasnt for the fact that there is already _a lot_ of code that depends on the current malloc() semantic that would break.
In an ironic turn of events I will have to argue with David LeBlanc here.
Hi David! you probably don't remember me, I wrote a few of the checks that ate your lunch :) and we also had our heated exchange in the past (hint: "netscape engineers are weenies")
jiji
i guess old habits die hard...
The fact is, most fielded code today is "broken"; most programs fail under low memory conditions, and most security reviewers don't consider the availability of memory as an input under attacker control.
1. unawareness of the atexit handler (Thomas made it somewhat implicit);
2. whether or not to handle all alloc failures the same way; and
3. whether or not we should change the semantics of C.
Some logically possible approaches to the "oh crap a basic operation failed"
problem are:
1. to use a global exception handler or rough equivalent (atexit, bad_alloc
in C++, signal handlers);
2. to use a special return value (NULL, 0xDEADBEEF, INVALID_HANDLE_VALUE);
or
3. to use a local exception handler or rough equivalent (try/catch, callback
passed as an argument to the possibly-failing function).
Note, global exception handlers are just a special, inflexible case of local
exception handlers, and therefore inferior. If you have local exception
handlers, why would you want to be limited to global ones? Signal handlers
and special callbacks like atexit date from the era before exceptions, and
we should not be limiting ourselves to 1970s error handling technology. It
wasn't good enough then, and it's not good enough now.
The problem with using a special return value to signal errors, especially
rare errors, is of course that nobody checks it (ignorance, misplaced
aesthetic concerns, laziness, stupidity). As Thomas notes, the suggestion
that people fix their bad code by only writing good code is unhelpful. We
have to work with programmers as they are, not as they should be. More than
that, though, failures that can be designed away should be. Why is C so
error-prone? Bad design, plain and simple. The fact that this:
*++v->foo++ = malloc(count * size);
is possible, let alone common, is clear evidence that C is a design failure.
(Fun game: count the bugs!) So now you know my opinion on whether or not we
should be redesigning or changing the semantics of C. Why should C 2009 be
compatible with C99? GCC (especially on Linux) can barely handle C99, let
alone what we need. So we might as well be incompatible; Microsoft felt no
compunctions and they were right not to. They just didn't go far enough.
But I digress. Designing a language/runtime with C's strengths but not its
weaknesses is a laudable goal, and redesigning the memory allocation
operation to be safe is a crucial part of that. How does D do it?
Exceptions have their own dangers of course, including possibly using malloc
to allocate the exception object itself, but they have the good quality that
they cannot be ignored. I think That is the crux of Thomas's proposal: get
the good, hard-to-ignore behavior of exceptions, but in plain C.
Another problem with global handlers is that the stuff they repair has to be
global, too (e.g. file handles to flush and close). Global variables are
often indicative of bugs since they are great places for hidden side-effects
to happen. We should not be designing our systems to use more globals, but
fewer. Factoring out all globals (except read-only constants) can be a great
pre-emptive bug squashing technique.
Alternatives: add exceptions to C (as did MS C), or use callbacks. Callbacks
are unnecessarily hard/ugly because C has no anonymous functions. But
wouldn't it be nice? Maybe. Say you have a file that you write super
important data to, and you want to lose as little data as possible in the
event of a failure.
FILE *important;
void handle_MyType_failure(void) {
fflush(important);
fclose(important);
}
typedef void(*allocate_failure_callback_t)(void);
void *allocate(size_t count, size_t size,
allocate_failure_callback_t callback)
{
size_t sz = count * size;
if (sz < count || sz < size ||
(1 != count && sz == count) || (1 != count && sz == size))
goto allocate_failure;
void *p = malloc(sz);
if (NULL == p)
goto allocate_failure;
return p;
allocate_failure:
if (NULL == callback)
abort();
else
callback();
}
int main() {
important = fopen(...);
// Checking that this is not NULL is the same problem as malloc, of
// course! Maybe it should be handled the same way.
// Write super important crap into important here.
MyType mt = allocate(1, sizeof MyType, handle_MyType_failure);
// Alternately, with anonymous callback:
mt = allocate(1, sizeof MyType, {
fflush(important);
fclose(important); } );
}
Using NULL for callback gives you Thomas's behavior.
This is of course not hugely different from just checking the return value:
if (NULL == (mt = malloc(sizeof MyType))) {
fflush(important);
fclose(important);
}
but it allows arguably cleaner handling (e.g. re-use of callbacks, yet with
the ability to make as many as you need), with the default of calling exit
(rather than SIGSEGV or worse).
Of course, already we're dreaming of paramaterizing the callback, aren't we?
Yes, we are. Sigh.
typedef void(*allocate_failure_callback_t)(int);
void *allocate(size_t count, size_t size,
allocate_failure_callback_t callback)
{
int err = ERROR_NONE;
size_t sz = count * size;
if (sz < count || sz < size ||
(1 != count && sz == count) || (1 != count && sz == size))
{
err = ERROR_INTEGER_OVERFLOW;
goto allocate_failure;
}
void *p = malloc(sz);
if (NULL == p) {
err = ERROR_ALLOCATION_FAILURE;
goto allocate_failure;
}
return p;
allocate_failure:
if (NULL == callback)
abort();
else
callback(err);
}
But, as Thomas argues, we can barely ever think of anything to do besides
abort anyway; nevermind a paramaterized error callback that behaves somehow
differently, yet intelligently. But the general technique could be useful
for situations other than malloc failure. I think a lot of it depends on the
availability of syntactic sugar like anonymous functions (which in the end
is kind of like what your try/catch and if/else blocks are anyway). (Which
suggests that maybe we should just check our return values and fire the
people who don't. Maybe this really is an HR problem, not an engineering
problem.)
Finally, people howl about the badness of just aborting in various
circumstances. Some designs make that not a problem, such as qmail. (Search
the Chargen history for Thomas' writeup on DJB's "10 Years of qmail" paper.)