[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [creduce-dev] Making creduce more parallel



How are the passes / reducers structured right now? If we can generate
all of a pass's potential reductions up front, then they can be
inserted into the queue in a random order to reduce the likelihood of
conflicts. If the passes don't separate generating a potential
reduction from testing it, then we may need to refactor more.

Let's talk about the line reduction pass at granularity 1 (each variant is created by removing one line). We're running it on a 1000-line file.

The search tree here has 2^1000 leaves, so we certainly don't want to try to generate all variants up-front.

What we can do is speculate: assume the variants are unsuccessful (statistically this is the right guess) and now we only have 1000 variants, so that is feasible, but not particularly fast since we're manipulating ~50 KB of text. Worse, this line of speculation becomes more and more out of sync with the current best reduction, assuming that some line removals succeed -- so merge conflicts are going to start happening.

The upshot is that while the coordinator can run ahead, it should run only far enough ahead that the queue doesn't empty out.

The API for a C-Reduce pass is purely functional. The pass takes a current test case and a pass state object, and either produces a new variant + pass state or else says that it has no more variants to generate. This API is not designed to facilitate picking random variants.

However, a quick hack to get a random variant is to just repeatedly invoke the pass a random number of times. This is not fast but it'll get things off the ground with very little effort. Some experimentation will be required to determine how this parameter interacts with the likelihood of merge conflicts.

I was imagining that the "orchestrator" process would spawn worker
threads that spawn-and-wait on interestingness processes and use
CSP-style channels to comminucate with the main thread that owns the
queue. This would leverage the modularity of passes that
generate potential reductions and of the interesting test.

Something like this diagram:
https://gist.githubusercontent.com/fitzgen/bf1acdc6dad217f2ed5accbabce9cf73/raw/981bff47be0f818e69041908eb63035fabb4e25a/orchestrator-diagram.txt

I was planning on prototyping in Rust, which has channels in its
standard library. Python's `queue.Queue` should also be able to handle
the job.

If you have other suggestions, I am all ears.

This sounds fine, the only suggestion I have is that you might consider using a network-friendly mechanism for the orchestration in case we wanted to see how reduction across multiple machines works. Or at least design things so that this isn't difficult if anyone wants to try it out later.

I would suggest that you start out dealing only with one or two passes, perhaps the line remover and the token remover. These do some heavy lifting, hopefully never crash, and are always easy to understand.

John