[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [creduce-dev] Making creduce more parallel
How are the passes / reducers structured right now? If we can generate
all of a pass's potential reductions up front, then they can be
inserted into the queue in a random order to reduce the likelihood of
conflicts. If the passes don't separate generating a potential
reduction from testing it, then we may need to refactor more.
Let's talk about the line reduction pass at granularity 1 (each variant
is created by removing one line). We're running it on a 1000-line file.
The search tree here has 2^1000 leaves, so we certainly don't want to
try to generate all variants up-front.
What we can do is speculate: assume the variants are unsuccessful
(statistically this is the right guess) and now we only have 1000
variants, so that is feasible, but not particularly fast since we're
manipulating ~50 KB of text. Worse, this line of speculation becomes
more and more out of sync with the current best reduction, assuming that
some line removals succeed -- so merge conflicts are going to start
The upshot is that while the coordinator can run ahead, it should run
only far enough ahead that the queue doesn't empty out.
The API for a C-Reduce pass is purely functional. The pass takes a
current test case and a pass state object, and either produces a new
variant + pass state or else says that it has no more variants to
generate. This API is not designed to facilitate picking random variants.
However, a quick hack to get a random variant is to just repeatedly
invoke the pass a random number of times. This is not fast but it'll
get things off the ground with very little effort. Some experimentation
will be required to determine how this parameter interacts with the
likelihood of merge conflicts.
I was imagining that the "orchestrator" process would spawn worker
threads that spawn-and-wait on interestingness processes and use
CSP-style channels to comminucate with the main thread that owns the
queue. This would leverage the modularity of passes that
generate potential reductions and of the interesting test.
Something like this diagram:
I was planning on prototyping in Rust, which has channels in its
standard library. Python's `queue.Queue` should also be able to handle
If you have other suggestions, I am all ears.
This sounds fine, the only suggestion I have is that you might consider
using a network-friendly mechanism for the orchestration in case we
wanted to see how reduction across multiple machines works. Or at least
design things so that this isn't difficult if anyone wants to try it out
I would suggest that you start out dealing only with one or two passes,
perhaps the line remover and the token remover. These do some heavy
lifting, hopefully never crash, and are always easy to understand.