[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[creduce-dev] Making creduce more parallel



Hi everyone,

I'm hacking on bindgen[0], a program that takes in C and C++ header
files and outputs Rust FFI bindings. I'm running creduce on C++ files
that have already been preprocessed and are megabytes in size.

[0] https://github.com/servo/rust-bindgen

I suspect that my interestingness test might be heavier than typical:
I do `clang++ -c testcase.hpp` to ensure that the test case is valid
C++ (I'm uninterested in malformed snippets), then I pass the file to
bindgen to generate Rust FFI bindings to the test case's contents.
Sometimes I stop there, other times I follow that up by compiling the
bindings themselves with `rustc`.

A full creduce run often takes 6+ hours for me. I'd like to
dramatically cut this number down.

I've gotten my hands on a beefy machine with 48 cores, but according
to this blog post[1], the current paradigm's optimal number of jobs is
around 3. This matches my experience: I haven't noticed any speed ups
in reduction from using more jobs than 3 (although I haven't been
timing carefully). Nor can I even peg all 48 of my cores regardless
how many jobs I tell creduce to use (I tried all the way up to 500).
Instead two cores are at 100%, a couple more are between 25-50%, and
all the rest are at 0%.

[1] http://blog.regehr.org/archives/749

Reading that blog post, I suspect that my experience is expected. It
seems that a new parallelization paradigm is required if I want to
really take advantage of all of these cores. So I have been thinking
about what such a paradigm might look like.

# Initial Idea: Map-Reduce

Initially, I thought about a map-reduce-y kind of thing: map out
potential reductions and interestingness tests across all cores and
then merge them back together in pairs in parallel. If a merge
conflicts, take the smaller of the two. Retest interestingness after
each merge and if it fails take the smaller of the two test cases
(both of which we know already passed the interestingness test).
Repeat this process to a fixed point.

The problem with this approach is that it will spend the majority of
its time in the merging phase, which is also the phase that won't be
taking advantage of all those cores. In fact it starts using less and
less of them as merging continues. Maybe we could do some tricks to
optimistically test interestingness of merges of merges, but the
underlying issue is that we are blocking the next wave of potential
reductions until all merging completes.

# Better Idea: Shared Work Queue

We put all potential reductions into a shared work queue, have a
worker per core which is pulling from this shared queue. We globally
maintain a current most-reduced test case.

Each worker checks the interestingness of the potential reduction that
it pulled from the shared work queue. If the potential reduction fails
the interestingness test, it is discarded. If the potential reduction
passes the interestingness test, then its size is compared against the
current most-reduced test case, and if this new reduction is smaller,
then it becomes the new global most-reduced test case. We also push
onto the queue new potential reductions based on this new most-reduced
test case.

However, if this reduction passes the interestingness test, but it is
not smaller than the current most-reduced global, then we add a new
potential reduction to the shared work queue: the merge of this
reduction and the current most-reduced global. The idea is that the
union of two reductions (a merge) is itself a potential reduction. If
there is a conflict in the merge, we discard it immediately and don't
even run the interestingness test.

The process completes when the shared work queue is empty, aka when we
have reached a fix point.

Because merges are just another kind of potential reduction, and there
is no waiting for merges to complete before trying any given potential
reduction, this scheme should better saturate all those cores.

It is worth pointing out that this is non-deterministic and racy:
merges depend on what happens to be the current most-reduced test case
and what the order of potential reductions in the queue happen to be.
Running creduce on the same input file twice won't guarantee the same
order of reductions or even the same results. For what it is worth, my
understanding of the current paradigm is that it has similar
properties.

# How Do We Merge?

I don't want to implement merging or rebasing patch files myself. What
if we leveraged git (or hg) for this? Each potential reduction would
locally clone a repository containing the test case into the temp dir,
commit the reduction's changes, and merging different reductions would
be outsourced to merging these commits with `git merge`.

A neat benefit of this approach is that we would be able to visualize
the reduction history with `git log --graph` :P

# Feedback?

What do you think?

I am interested in polishing this idea, prototyping it, and if all
goes well contributing these changes to creduce. My hope is that
implementation mostly involves changes to orchestration and that
reductions can remain unchanged.

Thanks for your time!

Nick