[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [creduce-dev] parallel tuning

Thanks Markus! If you have time to run 2,3,5 I'd be curious to see those too.

Earlier we had observed speedup peaking at around 8 cores, but that was on a burly multi-socket Xeon. I expect most people will do better with a smaller degree of parallelism.


On 11/17/15 3:01 PM, Markus Trippelsdorf wrote:
On 2015.11.17 at 11:00 +0100, John Regehr wrote:
C-Reduce's strategy of querying the number of CPUs and running that many
parallel reduction attempts is bad in some cases, such as on my Macbook
where it runs with concurrency 8, where 3 would be a better choice.

We did a bunch of benchmarking of this a few years ago but I'm afraid that
the results are very specific to not only the platforms but also the
interestingness tests.  Some of those have very light cache footprints
whereas others (for example those that invoke static analyzers) tend to blow
out the shared cache.

My current idea is that first we need to detect real cores instead of
hyperthreaded cores, which is sort of a pain but we can special-case Mac OS
and Linux I guess.  Then maybe something like:

- parallelism 2 on a dual core
- 3 on a 4-core
- 4 on a >4 core

How does this match with your experience?

I've tested creduce on a real 6 core machine without hyperthreading with
a 2MB C++ testcase:

creduce -n 1 --backup ./check.sh bug244.cc  2576.49s user 300.02s system 100% cpu 47:47.16 total

creduce -n 4 --backup ./check.sh bug244.cc  3714.57s user 480.69s system 243% cpu 28:46.14 total

creduce -n 6 --backup ./check.sh bug244.cc  4759.06s user 578.17s system 270% cpu 32:54.00 total

So your idea looks good to me.