[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [csmith-dev] Has anyone ever graphed "-O3 bugginess" by GCC version?



Hi Arthur, I don't know that this specific issue has been looked at. I agree that it's interesting!

The issue I have with using Csmith as the basis for this evaluation is that it's not really clear there there's a relationship between how Csmith triggers bugs and how programs we really care about trigger bugs.

John


On 9/3/15 3:04 AM, Arthur O'Dwyer wrote:
John, mailing-list-ers,

I was just thinking about whether it makes sense to use -O2 or -O3 for
compiling production software these days. Back, say, 5 years ago, it was
easy for me to diss "gcc -O3" because it was so "obviously" full of
bugs. But these days, with the rise of fuzz-testing and so on, are we
seeing an increase in the reliability of "gcc -O3" to the point where
-O3 is just as reliable as -O2?

What I'm really looking for is basically a graph where the X-axis is
"GCC version" and/or "year", and the Y-axis is "bugginess", perhaps
measured as "Csmith test cases per thousand whose -O3 output differs
from their -O0 output".
Even more interesting would be a family of these graphs, for "gcc -O2",
"clang -O2", "clang -O3", etc. That's really the part that would allow
me to find out whether my anti-O3 bias is (or was ever) justified.

John's September 2013 blog post "Are Compilers Getting More or Less
Reliable?"
http://blog.regehr.org/archives/1036
addresses a similar question, but not in exactly the terms I'm
interested in. Namely, there are only two data values on his X-axis
("2.7" and "trunk-as-of-2013"), and his Y-axis conflates -O3 bugs with
all other kinds of bugs.

Does anyone have any answers (even partial answers) related to the above
question?
Or even any spare grad students who can tackle it? ;)

Thanks much,
Arthur