DISQUS

Matasano Chargen: Coverage: Don’t Believe The Hype

  • 2guesswhat · 1 year ago
    It takes a village of security researchers to measure a system.
  • Andre Gironda · 1 year ago
    I don't like to mix the "code coverage" terminology with the "surface/vuln/threat coverage" terminology.

    Code coverage is a unique way of understanding the output of unit and/or functional white-box tests. You can have 10% code coverage and find all of the bugs (Yes, that's not a typo. 10% -- not 100%). You can have 30% or 85%. This all depends on where the conditions and decisions in the code are (sometimes referred to as "meatballs" and "gravy"). Condition-decision coverage is usually the best one to apply to languages such as C/++, C#, and Java from a develop-tester POV.

    Often, problems with measuring code coverage with NCSS come from issues such as getters/setters, as well as not measuring "statements" correctly in the first place (e.g. measuring brackets/braces on new lines). I'm sure there are other issues, but I figure I'd bring the more obvious ones up front and center.

    In the case of automated fuzz testing with code knowledge (either from EFS/PaiMei or by actually using the source code), line/statement coverage is often enough to go digging for untested inputs. In other words, when you're looking to get inside the heads of the product release team. In this case, you're not looking to "get" a certain percentage of code coverage -- you're looking "at" the code coverage statistics to find out where to start/stop testing.

    As far as "Vuln. Class Coverage" and "Threat Based Coverage" go... these don't belong in the same discussion as code coverage. They're just totally different things. What you refer to as Vuln/Threat coverage is a matter of risk assessment / risk management. The answer to the question, the question that drives us - "What is risk management?" is not always answered in a simple way. Everyone has their own perspective, as well as their own risk tolerance. The only answer is "we don't have an answer or a clear way of measuring this universally yet".

    I could suggest a few books on the subject of measuring security and evaluation risk, although it sounds like the people that you griping about don't read the literature in the first place.
  • Dave G. · 1 year ago
    @Dre

    For security testing, what good is code coverage without saying what you tested the code for? If you can test 10% and shake out all of the bugs, then you can test 90% and shake out none of them :)
  • Andre Gironda · 1 year ago
    Dave G: unit tests that verify input validation on inputs?
  • Dave G. · 1 year ago
    and what did you verify them against? :)
  • Andre Gironda · 1 year ago
    Methods should respond to invalid input by throwing an exception.

    Example one: When testing methods that accept an integer within a specific range, submit an integer outside of that range and then verify that the application throws an ArgumentOutOfRangeException.

    Example two: When you test methods that accept input as a string of limited length, submit a string that is too long or does not meet other requirements to be valid.

    Example three: If you determine that your application should reject any input containing HTML, test the method by submitting HTML, and fail the test unless the correct exception is detected.

    Some methods sanitize input rather than reject it outright. For example, methods that accept Web input which need to encode HTML characters, and methods that submit string input to a database need to parse SQL delimiters. Methods that sanitize malicious input are much more difficult to test, because they don't simply throw an exception or otherwise return an easily testable condition. In these circumstances, generate a test that checks both the input and the output to verify that the input was successfully sanitized.

    Note that this usually works best when done with TDD by developer-testers before SQE's get their dirty hands on the code. I suggest this to be done in a separate test environment if possible, both in-IDE as well as using a continuous integration server. This seems like more work, but when you integrate regression testing in the same manner, and take into account refactorings (as well as other similar design patterns found in modern development shops) - then it becomes more obvious that operational excellence is the result.