[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [csmith-dev] Has anyone ever graphed "-O3 bugginess" by GCC version?
Hi Arthur, I don't know that this specific issue has been looked at. I
agree that it's interesting!
The issue I have with using Csmith as the basis for this evaluation is
that it's not really clear there there's a relationship between how
Csmith triggers bugs and how programs we really care about trigger bugs.
On 9/3/15 3:04 AM, Arthur O'Dwyer wrote:
I was just thinking about whether it makes sense to use -O2 or -O3 for
compiling production software these days. Back, say, 5 years ago, it was
easy for me to diss "gcc -O3" because it was so "obviously" full of
bugs. But these days, with the rise of fuzz-testing and so on, are we
seeing an increase in the reliability of "gcc -O3" to the point where
-O3 is just as reliable as -O2?
What I'm really looking for is basically a graph where the X-axis is
"GCC version" and/or "year", and the Y-axis is "bugginess", perhaps
measured as "Csmith test cases per thousand whose -O3 output differs
from their -O0 output".
Even more interesting would be a family of these graphs, for "gcc -O2",
"clang -O2", "clang -O3", etc. That's really the part that would allow
me to find out whether my anti-O3 bias is (or was ever) justified.
John's September 2013 blog post "Are Compilers Getting More or Less
addresses a similar question, but not in exactly the terms I'm
interested in. Namely, there are only two data values on his X-axis
("2.7" and "trunk-as-of-2013"), and his Y-axis conflates -O3 bugs with
all other kinds of bugs.
Does anyone have any answers (even partial answers) related to the above
Or even any spare grad students who can tackle it? ;)