[csmith-dev] Has anyone ever graphed "-O3 bugginess" by GCC version?

To: csmith-dev@flux.utah.edu, John Regehr <regehr@cs.utah.edu>

Subject: [csmith-dev] Has anyone ever graphed "-O3 bugginess" by GCC version?

From: "Arthur O'Dwyer" <arthur.j.odwyer@gmail.com>

Date: Wed, 2 Sep 2015 18:04:44 -0700

Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:date:message-id:subject:from:to:content-type; bh=hpmAMELnnNWXLhX42yRQ6d1UYh6KUoke5trTq1Wwj7A=; b=yvlilw8Acyob7yr+NImU1TqBFZbL2bQp2UhFIvGxeSrTzB3B75fplNUCH5RcJkL1Mc kgCFLntql85JVTljHoi1U1Czs6D11KUwkeNV+0NenGLH/HDenLMzNAnRDnZBD5OLxx5j X8ANa751A3aGp5eG5yCtchwn0GGUBA1XQLmKiUuAXoNUr5wd2+iGHg/OcYbTBLNVggwD zPxB9QahVV28W1x04BlSnGHGuIjAHCfeXm0p5PKDVvXBmaNNFNjNQacp9NSeGAVghN6Z SdSbswU4yF9yLqtQ+TEHWwadiLznm31sNmKmyEXkvzbHyOtqH3/m5LShclHRmXMizre6 4XGA==

List-archive: </listarchives/csmith-dev>

List-help: <mailto:csmith-dev-request@flux.utah.edu?subject=help>

List-id: Csmith Development Mailing List <csmith-dev.flux.utah.edu>

List-post: <mailto:csmith-dev@flux.utah.edu>

List-subscribe: <http://www.flux.utah.edu/mailman/listinfo/csmith-dev>, <mailto:csmith-dev-request@flux.utah.edu?subject=subscribe>

List-unsubscribe: <http://www.flux.utah.edu/mailman/options/csmith-dev>, <mailto:csmith-dev-request@flux.utah.edu?subject=unsubscribe>

John, mailing-list-ers,

I was just thinking about whether it makes sense to use -O2 or -O3 for compiling production software these days. Back, say, 5 years ago, it was easy for me to diss "gcc -O3" because it was so "obviously" full of bugs. But these days, with the rise of fuzz-testing and so on, are we seeing an increase in the reliability of "gcc -O3" to the point where -O3 is just as reliable as -O2?

What I'm really looking for is basically a graph where the X-axis is "GCC version" and/or "year", and the Y-axis is "bugginess", perhaps measured as "Csmith test cases per thousand whose -O3 output differs from their -O0 output".

Even more interesting would be a family of these graphs, for "gcc -O2", "clang -O2", "clang -O3", etc. That's really the part that would allow me to find out whether my anti-O3 bias is (or was ever) justified.

John's September 2013 blog post "Are Compilers Getting More or Less Reliable?"

http://blog.regehr.org/archives/1036

addresses a similar question, but not in exactly the terms I'm interested in. Namely, there are only two data values on his X-axis ("2.7" and "trunk-as-of-2013"), and his Y-axis conflates -O3 bugs with all other kinds of bugs.

Does anyone have any answers (even partial answers) related to the above question?

Or even any spare grad students who can tackle it? ;)

Thanks much,

Arthur