[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[creduce-dev] parallel tuning
C-Reduce's strategy of querying the number of CPUs and running that many
parallel reduction attempts is bad in some cases, such as on my Macbook
where it runs with concurrency 8, where 3 would be a better choice.
We did a bunch of benchmarking of this a few years ago but I'm afraid
that the results are very specific to not only the platforms but also
the interestingness tests. Some of those have very light cache
footprints whereas others (for example those that invoke static
analyzers) tend to blow out the shared cache.
My current idea is that first we need to detect real cores instead of
hyperthreaded cores, which is sort of a pain but we can special-case Mac
OS and Linux I guess. Then maybe something like:
- parallelism 2 on a dual core
- 3 on a 4-core
- 4 on a >4 core
How does this match with your experience?