[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[creduce-dev] parallel tuning



C-Reduce's strategy of querying the number of CPUs and running that many parallel reduction attempts is bad in some cases, such as on my Macbook where it runs with concurrency 8, where 3 would be a better choice.

We did a bunch of benchmarking of this a few years ago but I'm afraid that the results are very specific to not only the platforms but also the interestingness tests. Some of those have very light cache footprints whereas others (for example those that invoke static analyzers) tend to blow out the shared cache.

My current idea is that first we need to detect real cores instead of hyperthreaded cores, which is sort of a pain but we can special-case Mac OS and Linux I guess. Then maybe something like:

- parallelism 2 on a dual core
- 3 on a 4-core
- 4 on a >4 core

How does this match with your experience?

John