John, mailing-list-ers,
I was just thinking about whether it makes sense to use -O2 or -O3 for
compiling production software these days. Back, say, 5 years ago, it was
easy for me to diss "gcc -O3" because it was so "obviously" full of
bugs. But these days, with the rise of fuzz-testing and so on, are we
seeing an increase in the reliability of "gcc -O3" to the point where
-O3 is just as reliable as -O2?
What I'm really looking for is basically a graph where the X-axis is
"GCC version" and/or "year", and the Y-axis is "bugginess", perhaps
measured as "Csmith test cases per thousand whose -O3 output differs
from their -O0 output".
Even more interesting would be a family of these graphs, for "gcc -O2",
"clang -O2", "clang -O3", etc. That's really the part that would allow
me to find out whether my anti-O3 bias is (or was ever) justified.
John's September 2013 blog post "Are Compilers Getting More or Less
Reliable?"
http://blog.regehr.org/archives/1036
addresses a similar question, but not in exactly the terms I'm
interested in. Namely, there are only two data values on his X-axis
("2.7" and "trunk-as-of-2013"), and his Y-axis conflates -O3 bugs with
all other kinds of bugs.
Does anyone have any answers (even partial answers) related to the above
question?
Or even any spare grad students who can tackle it? ;)
Thanks much,
Arthur