[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [creduce-dev] Making creduce more parallel
I still had no time to read everything in thia conversation but one thing about the line removal came to my mind.
If I remember correctly the removals for one fixed granularity settings should not overlap. So merging should be easy be just picking one of the two files at random, searching for the part that has been remove in the other file (that need's some modification of the pass so that it reports what has been removed), and removing it from the first file as well.
For the first runs where granularity is set to a high value we don't get a many parallel jobs but I guess that's ok because we potentially remove a large part at once. And later there should be no problem to get a lot of parallelism as long as the file is still larger enough.
I guess after each merge we also would need to run an interestingness test, won't we?
On 27 Jan 2017 18:27, John Regehr <firstname.lastname@example.org> wrote:
> > How are the passes / reducers structured right now? If we can generate
> > all of a pass's potential reductions up front, then they can be
> > inserted into the queue in a random order to reduce the likelihood of
> > conflicts. If the passes don't separate generating a potential
> > reduction from testing it, then we may need to refactor more.
> Let's talk about the line reduction pass at granularity 1 (each variant
> is created by removing one line). We're running it on a 1000-line file.
> The search tree here has 2^1000 leaves, so we certainly don't want to
> try to generate all variants up-front.
> What we can do is speculate: assume the variants are unsuccessful
> (statistically this is the right guess) and now we only have 1000
> variants, so that is feasible, but not particularly fast since we're
> manipulating ~50 KB of text. Worse, this line of speculation becomes
> more and more out of sync with the current best reduction, assuming that
> some line removals succeed -- so merge conflicts are going to start
> The upshot is that while the coordinator can run ahead, it should run
> only far enough ahead that the queue doesn't empty out.
> The API for a C-Reduce pass is purely functional. The pass takes a
> current test case and a pass state object, and either produces a new
> variant + pass state or else says that it has no more variants to
> generate. This API is not designed to facilitate picking random variants.
> However, a quick hack to get a random variant is to just repeatedly
> invoke the pass a random number of times. This is not fast but it'll
> get things off the ground with very little effort. Some experimentation
> will be required to determine how this parameter interacts with the
> likelihood of merge conflicts.
> > I was imagining that the "orchestrator" process would spawn worker
> > threads that spawn-and-wait on interestingness processes and use
> > CSP-style channels to comminucate with the main thread that owns the
> > queue. This would leverage the modularity of passes that
> > generate potential reductions and of the interesting test.
> > Something like this diagram:
> > https://gist.githubusercontent.com/fitzgen/bf1acdc6dad217f2ed5accbabce9cf73/raw/981bff47be0f818e69041908eb63035fabb4e25a/orchestrator-diagram.txt
> > I was planning on prototyping in Rust, which has channels in its
> > standard library. Python's `queue.Queue` should also be able to handle
> > the job.
> > If you have other suggestions, I am all ears.
> This sounds fine, the only suggestion I have is that you might consider
> using a network-friendly mechanism for the orchestration in case we
> wanted to see how reduction across multiple machines works. Or at least
> design things so that this isn't difficult if anyone wants to try it out
> I would suggest that you start out dealing only with one or two passes,
> perhaps the line remover and the token remover. These do some heavy
> lifting, hopefully never crash, and are always easy to understand.