False Positives and False Negatives

Source code analysis produces a large amount of “false positive” results, which is one of the biggest complaints we hear against source code analyzers. Development teams spend significant amounts of time filtering out what’s real and what’s not real. This is colossal waste of time for the development team. And it can eat away at team morale.

Regardless of how skilled a developer may be, it is likely that their code will have some kind of unintentional error or vulnerability. To ensure that these coding errors and vulnerabilities are identified early, developers often use a Code Inspection tool, which checks the code against rules that developers have set up.

However, static code analyzers are not perfect, and sometimes the tool can identify false positives and false negatives. If these coding errors are not caught, it could have a significant and noticeable impact on the code.

For that reason, it is essential for you to be able to effectively identify false positive and false negatives within your static code analysis results.

Static Reviewer, further to extremely reduce the amount of False Positives, has 0 False Negatives.

See OWASP Benchmark results and our patented algorithm, named Dynamic Syntax Tree.

Anyway, you can mark such few False Positives Per Category, Per Issue and Per File. They will be listed inside the reports:


What Is a False Positive?

A false positive is an issue that doesn’t actually exist in the code. It doesn’t need to be fixed. This happens when no rule violation exists, but a diagnostic is generated.

Meanwhile, a true positive is an issue that needs to be fixed. It violates a rule and is, in fact, a real problem.

But sifting the true positives from the false ones can be tricky. And false negatives can be even trickier.

What Is a False Negative?

A false negative is an issue that goes undetected. This happens when a rule violation exists, but no diagnostic is created.

Meanwhile, a true negative means you don’t have an issue. There is no rule violation.

So, finding false negatives is really tricky. How will you know if there’s a bug you’ve missed?

What Causes False Positives and False Negatives?

There are two primary causes of false diagnostics.

Tools Make Mistakes

Tools aren’t perfect. They make mistakes. And false positives and negatives are inevitable.

That’s why it’s critical to have a human looking over your code — and any violations detected by the tool.

For instance, you may have a rule that there can be no Divide By Zero (DBZ) issues. The tool may then flag a section of code with a DBZ issue. So, you take a closer look at it and realize that there isn’t actually an issue here. You just had a false positive.

Undecidable Rules

You might have coding rules that can’t be decided — they’re undecidable. And that means it can’t be enforced with 100 percent accuracy.

How Does Undecidability Happen?

Undecidability can happen when you lack visibility.

If you had perfect visibility into everything in your program, you’d be able to decide whether a rule was violated or not. You could review diagnostics from a static analyzer and know “That’s a false positive!”

But, you don’t know everything that’s gone into your program. Other programmers wrote code for other parts of the program that you don’t have access to (e.g., firmware). Input came in from elsewhere. So, without clear visibility into everything, you can’t tell if there’s a real problem.

How to Diagnose False Positives and False Negatives

There are some false positives and false negatives that are no-brainers. They’re clearly black or white.

But there’s always a grey area.

Deciding False Positives and Negatives

Deciding diagnostics is subjective. It depends on the industry you’re working in. And it depends on the coding rules you’re working with.

False Positives Vary

A false positive for one company might not be a false positive for another.

Here's a false positive example. You might be developing software that will go in a Medical Device. Lives could be at risk if there are issues in the software. So, if you have a rule that there can be no DBZ issues — and you get a diagnostic that there are — you’ll need to carefully evaluate each violation.

But, you might be developing software to go in an entertainment system. So, you’d want to dismiss false positives quickly. You only want to look at true positives.

False Negatives Vary, Too

Likewise, a false negative for one company might not be false for another.

Here's a false negative example. You might use CERT or MISRA coding rules if you need to be really defensive about your program. A rule would be a false negative if it didn’t catch the possibility of something happening.

But, for another company, it would only be a false negative if it didn’t catch something that will absolutely happen.

As you expand your visibility, what you would consider a false positive or false negative gets refined.

Proving False Positives and Negatives

How much work you need to do to prove false positives and negatives varies. If you’re in a high-risk, safety-critical industry, you’ll need to prove it false. If you’re in a lower-risk industry, you might be able to review the diagnostic, dismiss it as false, and move on.

How to Reduce False Positives and False Negatives

False positives and false negatives are inevitable.

False positives cost additional review time. And they may cause real issues to be hastily dismissed.

False negatives are a key concern for mission-critical software developers. For these developers, false positives are better than false negatives.

Not All Code Checkers Are the Same…

Not all code checkers — e.g., MISRA checkers — are the same. Some are more accurate than others. And some will give you more false positives and false negatives in your diagnostics.

What MISRA Checkers Do

A MISRA® checker evaluates your C/C++ code against the MISRA coding standard. It identifies noncompliant code. And you'll then know which code you need to fix. However...

Every MISRA Checker Is Different

Some MISRA checkers produce false diagnostics. You get false positives — or worse, you get false negatives.

Other MISRA checkers do not help with undecidable conditions. They can't tell whether or not your code is compliant.

In this white paper, you'll learn:

  • Why some MISRA checkers are better than others.

  • How to enforce undecidable MISRA rules.

  • What to look for in your MISRA checker for C/C++.

Choose the Best Code Checker

Choosing the right code analyzer gives you better diagnostics. 

There are several characteristics that should be considered when selecting a tool to check for coding standards:

  • Complete dataflow and execution analysis. Tools that can find violations of syntactic rules, as well as problems that require deep semantic knowledge of the entire program, are necessary.

  • Ability to enable/disable enforcement of certain rules. You also want the ability to define your own rules, in case you add some that are specific to your project, and disable other rules, if they are unrelated to your deployed environment.

  • Presence of command-line version to seamlessly integrate the analysis in the build process. Tools should support existing build environments and not require radical changes to your existing tools and processes. Command-line versions also allow for nightly batch runs of the code checker to collect quality data.

  • Comparison and report features. Knowing how the code quality progresses from build to build is essential in tracking the quality of your code.

  • Detailed description and visible call path for warnings.

  • Cost of ownership.

When you get the right diagnostics, you can reduce false positives and false negatives. So, you’ll have safe and secure code. Consistent style. And an easier-to-maintain codebase.

Security Reviewer Qualification Kit

Security Reviewer offers a Qualification Kit which provides documentation, test cases, and procedures that let you qualify Security Reviewer Code Inspection for projects based on the safety standards MISRA and CERT.
The kit contains tool qualification plans, tool operational requirements, and other materials required for qualifying Security Reviewer for usage in safety critical projects. For every used feature of Security Reviewer the user is able to execute test cases in his environment that demonstrate the absence of errors.
The kit facilitates certification of embedded systems which use Security Reviewer for analyzing developed code. The user can modify the artifacts in the tool qualification kit for its specific project.

The Qualification Kit for Security Reviewer consists of:

  • A qualification support tool that guides the user through the qualification and generates the following documents:

    • Tool Classification Report

    • Tool Qualification Plan/Report

    • Tool Safety Manual

    • Test Plan

  • The test automation unit

  • The test suite with test cases.

  • The user manual of the Qualification Kit

The kit includes generic compliance reports that demonstrate compliance of Security Reviewer with essential security standards.

Why should Software Testing Tools be Qualified and not Certified?

Software tool manufacturers offer certificates and / or qualification kits for their tools, which are used in safety-critical projects such as ISO 26262.
While the Tool-Qualification-Kit can be used in all projects, "tool certifications" are always linked to a particular environment and are therefore generally not effective.

The Qualification Kit for Security Reviewer is used successfully by important companies in automotive, medical and aerospace sectors. Most of them rate the kit of "best in class".

Ask to info@securityreviewer.com

OWASP Benchmark

The OWASP Benchmark Project is a Java test suite designed to evaluate the accuracy, coverage, and speed of automated software vulnerability detection tools. Without the ability to measure these tools, it is difficult to understand their strengths and weaknesses, and compare them to each other on False Positives and False Negatives. In order to properly measure a tool, both OWASP Benchmark v1.1 and v1.2 must be executed.

Security Reviewer is rated on top of OWASP Benchmark with the following results:

  • v1.2 → 100%, with 0% of False Positives and 0% of False Negatives

  • v1.1 → 92%, with 5.1% of False Positives and 2.9% of False Negatives

  • v1.1+v.1.2 → 96% with 2.5% of average False Positive, on which the most important vulnerabilites have 0%.

It demonstrated the lowest False Positive Rate and Best Accuracy Score. Benchmark results can be independently verified. For more information, please contact Security Reviewer at info@securityreviewer.com

Most important vulnerabilities