[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [csmith-dev] Has anyone ever graphed "-O3 bugginess" by GCC version?

To: "Arthur O'Dwyer" <arthur.j.odwyer@gmail.com>, csmith-dev@flux.utah.edu
Subject: Re: [csmith-dev] Has anyone ever graphed "-O3 bugginess" by GCC version?
From: John Regehr <regehr@cs.utah.edu>
Date: Thu, 3 Sep 2015 09:02:25 +0200
In-reply-to: <CADvuK0+nFVqhH-Rh_UKU9PSjYzz2UWtRyX1DAnEBP+HqNkb6MQ@mail.gmail.com>
List-archive: </listarchives/csmith-dev>
List-help: <mailto:csmith-dev-request@flux.utah.edu?subject=help>
List-id: Csmith Development Mailing List <csmith-dev.flux.utah.edu>
List-post: <mailto:csmith-dev@flux.utah.edu>
List-subscribe: <http://www.flux.utah.edu/mailman/listinfo/csmith-dev>, <mailto:csmith-dev-request@flux.utah.edu?subject=subscribe>
List-unsubscribe: <http://www.flux.utah.edu/mailman/options/csmith-dev>, <mailto:csmith-dev-request@flux.utah.edu?subject=unsubscribe>
References: <CADvuK0+nFVqhH-Rh_UKU9PSjYzz2UWtRyX1DAnEBP+HqNkb6MQ@mail.gmail.com>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:38.0) Gecko/20100101 Thunderbird/38.2.0

Hi Arthur, I don't know that this specific issue has been looked at. Iagree that it's interesting!

The issue I have with using Csmith as the basis for this evaluation isthat it's not really clear there there's a relationship between howCsmith triggers bugs and how programs we really care about trigger bugs.


John


On 9/3/15 3:04 AM, Arthur O'Dwyer wrote:

John, mailing-list-ers,

I was just thinking about whether it makes sense to use -O2 or -O3 for
compiling production software these days. Back, say, 5 years ago, it was
easy for me to diss "gcc -O3" because it was so "obviously" full of
bugs. But these days, with the rise of fuzz-testing and so on, are we
seeing an increase in the reliability of "gcc -O3" to the point where
-O3 is just as reliable as -O2?

What I'm really looking for is basically a graph where the X-axis is
"GCC version" and/or "year", and the Y-axis is "bugginess", perhaps
measured as "Csmith test cases per thousand whose -O3 output differs
from their -O0 output".
Even more interesting would be a family of these graphs, for "gcc -O2",
"clang -O2", "clang -O3", etc. That's really the part that would allow
me to find out whether my anti-O3 bias is (or was ever) justified.

John's September 2013 blog post "Are Compilers Getting More or Less
Reliable?"
http://blog.regehr.org/archives/1036
addresses a similar question, but not in exactly the terms I'm
interested in. Namely, there are only two data values on his X-axis
("2.7" and "trunk-as-of-2013"), and his Y-axis conflates -O3 bugs with
all other kinds of bugs.

Does anyone have any answers (even partial answers) related to the above
question?
Or even any spare grad students who can tackle it? ;)

Thanks much,
Arthur

References:
- [csmith-dev] Has anyone ever graphed "-O3 bugginess" by GCC version?
  - From: "Arthur O'Dwyer" <arthur.j.odwyer@gmail.com>

Prev by Date: [csmith-dev] Has anyone ever graphed "-O3 bugginess" by GCC version?
Next by Date: [csmith-dev] Pull request for "float-test" additions to Csmith
Previous by thread: [csmith-dev] Has anyone ever graphed "-O3 bugginess" by GCC version?
Next by thread: [csmith-dev] Pull request for "float-test" additions to Csmith
Index(es):
- Date
- Thread