[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [creduce-dev] Making creduce more parallel

To: creduce-dev@flux.utah.edu
Subject: Re: [creduce-dev] Making creduce more parallel
From: John Regehr <regehr@cs.utah.edu>
Date: Wed, 25 Jan 2017 09:09:12 -0700
In-reply-to: <CAN1aaTUxQH8n9G-JWEbETS_y8LB6ap4F5cRUBVcQrLZTSbs8tQ@mail.gmail.com>
List-archive: </listarchives/creduce-dev>
List-help: <mailto:creduce-dev-request@flux.utah.edu?subject=help>
List-id: C-Reduce Development Mailing List <creduce-dev.flux.utah.edu>
List-post: <mailto:creduce-dev@flux.utah.edu>
List-subscribe: <http://www.flux.utah.edu/mailman/listinfo/creduce-dev>, <mailto:creduce-dev-request@flux.utah.edu?subject=subscribe>
List-unsubscribe: <http://www.flux.utah.edu/mailman/options/creduce-dev>, <mailto:creduce-dev-request@flux.utah.edu?subject=unsubscribe>
References: <CAN1aaTUxQH8n9G-JWEbETS_y8LB6ap4F5cRUBVcQrLZTSbs8tQ@mail.gmail.com>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.12; rv:45.0) Gecko/20100101 Thunderbird/45.6.0

We put all potential reductions into a shared work queue, have a
worker per core which is pulling from this shared queue. We globally
maintain a current most-reduced test case.


I think this is reasonable.

At this point I should mention that Moritz Pflanzer has are-implementation of C-Reduce in Python that we have tentatively plannedto replace the Perl implementation with, at some suitable time. It'spossible that C-Reduce hacking of the type you are proposing should bedone on that version.

However, if this reduction passes the interestingness test, but it is
not smaller than the current most-reduced global, then we add a new
potential reduction to the shared work queue: the merge of this
reduction and the current most-reduced global. The idea is that the
union of two reductions (a merge) is itself a potential reduction. If
there is a conflict in the merge, we discard it immediately and don't
even run the interestingness test.


Sounds right.

As I was saying, some randomness will need to be integrated into thepasses to reduce the likelihood of conflicts. The passes aren't reallygeared for that so that'll take a bit of thought.

Because merges are just another kind of potential reduction, and there
is no waiting for merges to complete before trying any given potential
reduction, this scheme should better saturate all those cores.

Yes, definitely, though at least sometimes we'll be running up againstlimits other than cores, such as the fact that compilers use a lot ofmemory bandwidth.

It is worth pointing out that this is non-deterministic and racy:
merges depend on what happens to be the current most-reduced test case
and what the order of potential reductions in the queue happen to be.
Running creduce on the same input file twice won't guarantee the same
order of reductions or even the same results. For what it is worth, my
understanding of the current paradigm is that it has similar
properties.

Often the interestingness test itself is non-deterministic (due totimeouts) so it's very hard to truly avoid non-determinism.

There's one algorithmic choice in the parallel reducer where we have todecide to take the first process that terminates with interesting testcase or whether we pull these off the queue in order. I can't rememberfor sure but I believe I went with the more conservative choice eventhrough it sacrifices a bit of performance.

For a paper that we're working on about C-Reduce I have some experimentsplanned to evaluate the effect of determinism on reduction. In otherwords, if we run the same reduction 100 times but with phase orderingrandomized, what does the resulting distribution of final file sizeslook like? My guess is that in many cases the distribution will befairly tight but that every now and then the randomness will find a muchbetter solution. Anyhow I'm looking forward to seeing the results of this!

I don't want to implement merging or rebasing patch files myself. What
if we leveraged git (or hg) for this? Each potential reduction would
locally clone a repository containing the test case into the temp dir,
commit the reduction's changes, and merging different reductions would
be outsourced to merging these commits with `git merge`.

My intuition is that git/hg would eat a lot of performance but I couldbe wrong. It would certainly be amusing if C-Reduce ended up being asmall pile of git hooks :).

I am interested in polishing this idea, prototyping it, and if all
goes well contributing these changes to creduce. My hope is that
implementation mostly involves changes to orchestration and that
reductions can remain unchanged.

The thing that I'm proudest of in the C-Reduce implementation is themodularity. The core is just not that complicated, most of the goodstuff lives in passes that can be thought of as purely functional. Sothe structure does lend itself to the kind of experimentation you'retalking about.


Anyhow take a look at Moritz's implementation too:

https://github.com/mpflanzer/creduce/tree/python

What IPC mechanism do you have in mind for the work queue?

John

Follow-Ups:
- Re: [creduce-dev] Making creduce more parallel
  - From: Nick Fitzgerald <fitzgen@gmail.com>

References:
- [creduce-dev] Making creduce more parallel
  - From: Nick Fitzgerald <fitzgen@gmail.com>

Prev by Date: Re: [creduce-dev] Making creduce more parallel
Next by Date: Re: [creduce-dev] Making creduce more parallel
Previous by thread: Re: [creduce-dev] Making creduce more parallel
Next by thread: Re: [creduce-dev] Making creduce more parallel
Index(es):
- Date
- Thread