Re: [creduce-dev] unifdef

Wth a Boost::log example I have tried to preprocess everything and throw creduce at it with little success after five of reducing. It did go from 300K lines to 10K lines but the process slows down and that's still nowhere near a reduced example a human can understand.

There are three reasons for this.

First, every interestingness test was slow to compile, about 10s.

Second, Boost-using code and Boost-headers usually include many more includes than actually required for a specific example. We are talking about very large preprocessed files. Preprocessing the Boost::log example sinks_async.cpp results in 8.7MB file on my system. Once preprocessed they are not quickly reduced and it's much more efficient to reduce one #include line rather than reducing the resulting huge preprocessed code - which may not be needed at all.

Finally, the Boost usage of templates, inheritance and macros (with non-preprocessed code) is very (too?) complex to automatically reduce and required manual help anyhow. Helping the reduce process is feasible only while the source is still human-readable and small.

With the Boost::log example I used creduce mostly for deleting the includes, most of them were *not* required at every include depth, then manually copying the remaining include files text into the main include, in part or in whole (judgement call). It took about a day work to reduce this example.

With regard to included files, when the nesting of includes is very deep (as in Boost) it's very efficient to to replace the included with the include-included files only, no other text, see if this passes the interestingness test, reduce the new includes and repeat. This process cuts you through the first six layers of includes or so with Boost, getting quickly to the "real" code. When the interestingness test fails, creduce can try to copy the whole file or it's time for manual intervention.

2015-10-31 12:20 GMT+02:00 John Regehr <regehr@cs.utah.edu>:

Yaron, thanks!

I wonder if you could tell me a bit more about why you reduce non-preprocessed Boost code? Is it common for compiler bugs to go away after preprocessing?

I'm just curious about this.

Another thing about #includes: I've noticed that reduced non-preprocessed code often ends up containing chains of trivial includes. We can do three things about this:

- ignore it, since it's no big deal

- add a pass that replaces and #include directive with the included file

- add a pass that only replaces an #include when the included file is below some size threshold

John

On 10/31/15 11:04 AM, Yaron Keren wrote:

Hi John,

This sounds great!
For nonprocessed code it's sometimes also useful to try and remove
includes, see attached pass.
While pass_lines would eventually get rid of them, experience reducing
boost examples suggests it's much more efficient to go for the includes
as first pass, usually most of them are not required, typically repeats
for five-ten levels of nested includes. Maybe this should be merged into
unidef or added as its own pass.

Yaron

2015-10-31 11:05 GMT+02:00 John Regehr <regehr@cs.utah.edu
<mailto:regehr@cs.utah.edu>>:

I just merged the new pass that does partial resolution of ifdefs.

For the 196 transitive include files that come from a C++ hello
world on OS X (not making this up) the unifdef pass by itself gives
a 57% reduction (in 352 seconds) if we choose -D first and -U second
for each CPP symbol. Reduction is only 45% for -U first (in 518
seconds). I have no idea if this generalizes but I'm going with -D
first for now.

Also I'm putting the unifdef and comment removal passes right up at
the front of the phase ordering on the idea that they work well for
non-preprocessed code and terminate very quickly if there aren't any
ifdefs / comments to remove. We can revisit this if my idea isn't
right.

John