[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[creduce-dev] parallel tuning



C-Reduce's strategy of querying the number of CPUs and running that many parallel reduction attempts is bad in some cases, such as on my Macbook where it runs with concurrency 8, where 3 would be a better choice.
We did a bunch of benchmarking of this a few years ago but I'm afraid 
that the results are very specific to not only the platforms but also 
the interestingness tests.  Some of those have very light cache 
footprints whereas others (for example those that invoke static 
analyzers) tend to blow out the shared cache.
My current idea is that first we need to detect real cores instead of 
hyperthreaded cores, which is sort of a pain but we can special-case Mac 
OS and Linux I guess.  Then maybe something like:
- parallelism 2 on a dual core
- 3 on a 4-core
- 4 on a >4 core

How does this match with your experience?

John