[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [creduce-dev] parallel tuning

To: Markus Trippelsdorf <markus@trippelsdorf.de>
Subject: Re: [creduce-dev] parallel tuning
From: John Regehr <regehr@cs.utah.edu>
Date: Tue, 17 Nov 2015 15:08:57 +0100
Cc: creduce-dev@flux.utah.edu
In-reply-to: <20151117140109.GA320@x4>
List-archive: </listarchives/creduce-dev>
List-help: <mailto:creduce-dev-request@flux.utah.edu?subject=help>
List-id: C-Reduce Development Mailing List <creduce-dev.flux.utah.edu>
List-post: <mailto:creduce-dev@flux.utah.edu>
List-subscribe: <http://www.flux.utah.edu/mailman/listinfo/creduce-dev>, <mailto:creduce-dev-request@flux.utah.edu?subject=subscribe>
List-unsubscribe: <http://www.flux.utah.edu/mailman/options/creduce-dev>, <mailto:creduce-dev-request@flux.utah.edu?subject=unsubscribe>
References: <564AFAA1.20505@cs.utah.edu> <20151117140109.GA320@x4>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:38.0) Gecko/20100101 Thunderbird/38.3.0

Thanks Markus! If you have time to run 2,3,5 I'd be curious to seethose too.

Earlier we had observed speedup peaking at around 8 cores, but that wason a burly multi-socket Xeon. I expect most people will do better witha smaller degree of parallelism.


John


On 11/17/15 3:01 PM, Markus Trippelsdorf wrote:

On 2015.11.17 at 11:00 +0100, John Regehr wrote:

C-Reduce's strategy of querying the number of CPUs and running that many
parallel reduction attempts is bad in some cases, such as on my Macbook
where it runs with concurrency 8, where 3 would be a better choice.

We did a bunch of benchmarking of this a few years ago but I'm afraid that
the results are very specific to not only the platforms but also the
interestingness tests.  Some of those have very light cache footprints
whereas others (for example those that invoke static analyzers) tend to blow
out the shared cache.

My current idea is that first we need to detect real cores instead of
hyperthreaded cores, which is sort of a pain but we can special-case Mac OS
and Linux I guess.  Then maybe something like:

- parallelism 2 on a dual core
- 3 on a 4-core
- 4 on a >4 core

How does this match with your experience?


I've tested creduce on a real 6 core machine without hyperthreading with
a 2MB C++ testcase:

creduce -n 1 --backup ./check.sh bug244.cc  2576.49s user 300.02s system 100% cpu 47:47.16 total

creduce -n 4 --backup ./check.sh bug244.cc  3714.57s user 480.69s system 243% cpu 28:46.14 total

creduce -n 6 --backup ./check.sh bug244.cc  4759.06s user 578.17s system 270% cpu 32:54.00 total

So your idea looks good to me.

Follow-Ups:
- Re: [creduce-dev] parallel tuning
  - From: Markus Trippelsdorf <markus@trippelsdorf.de>

References:
- [creduce-dev] parallel tuning
  - From: John Regehr <regehr@cs.utah.edu>
- Re: [creduce-dev] parallel tuning
  - From: Markus Trippelsdorf <markus@trippelsdorf.de>

Prev by Date: Re: [creduce-dev] parallel tuning
Next by Date: Re: [creduce-dev] parallel tuning
Previous by thread: Re: [creduce-dev] parallel tuning
Next by thread: Re: [creduce-dev] parallel tuning
Index(es):
- Date
- Thread