[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [creduce-dev] [RFC] Switching from Perl to Python



Hi John,

> Hi Moritz, this is cool.  I've thought about the Perl vs Python issue a number of times and basically I just do not love Python no matter how many times I start writing it.  On the other hand I can probably get over this.

Oh, seems like a will have a hard time to convince you that Python is not too bad in the end. When I first used Python I was sceptical as well but after using it for some time it has become a convenient tool for tasks which require a little bit more than just a shell script. But I guess that is what you would use Perl for.


> My guess is that the speedup you're seeing is mostly due to running fewer passes, since in general CPython is pretty suckily slow compared to Perl.  Probably not a big issue for C-Reduce, however, which is almost always bottlenecked by interestingness tests.

Just be clear, I run the same passes for both version (Perl and Python). Because the Python version is not complete yet, I had to disable some of the passes in the Perl version. I also checked that both version produce the same reduced file in the end.
It might be that not using the original test scripts but Python based ones have caused the difference. But as long as performance is not a main criterion for you I would spent more time in analysis this.


> I do feel strongly that the abstraction boundary between the core and the passes and the interestingness tests should be a strong one, probably a process by default.

I agree. And unless you would want to give up the parallel approach there is currently no other way to run multiple tests in parallel with Python than to launch processes. I first thought about threads but the "Global Interpreter Lock" prevent concurrent execution of Python code in multiple threads.


> Anyway I need to think about it more and no doubt the other C-Reduce people will have opinions.  I'm open to moving to a different implementation of the C-Reduce core, but not until the replacement is feature complete (and I'm probably not going to have a lot of time to work on it myself, but I'm happy to do code reviews).

I would be willing to do the rewriting. Though instead of just "translating" the Perl code into Python I would suggest to think about potential changes to improve the maintainability and readability. Maybe you have something that you always wanted to change anyway but never had the time?
I think that could be easily done in the style of small code reviews by splitting the work up into smaller chunks.


> Keep in mind that the C-Reduce passes are not all equally useful and some merging and removing of functionality can probably be done without hurting the end results.

If you want to remove something it would reduce the effort of rewriting but currently I think it shouldn't be necessary. And it is not much left to implement. Let's see if others are interested too but if I have some spare time I might already continue to port the last passes to see if there might be more difficulties than I expect. And even if the Python version is not used in the end I might find some bugs which can be fixed. (Due to a mistake in my Python implementation I noticed that the remove-pointer-pairs pass should have been called reduce-pointer-pairs -- now fixed ;-)

Regards,

Moritz


> On 5/26/16 9:36 PM, Moritz Pflanzer wrote:
>> Hi all,
>> 
>> I am wondering if there might be interest in rewriting the C-Reduce core algorithm and the reduction passes in Python. Potential benefits could be:
>> 
>> - I suspect more people are familiar with Python than with Perl
>> - Python offers a lager set of features without the need to install additional modules (see below)
>> - The implementation seems to be a bit simpler and cross-platform compatibility seems to be easier (see below)
>> - Python is more actively maintained? (Here I am just guessing based on recent popularity)
>> - A Python based implementation could lead to smaller run-times (see below)
>> 
>> Feel free to add other points or to discuss about potential cons of switching. So far I could think of:
>> 
>> - Some effort is required to do the rewriting
>> - You guys might be more familiar with Perl?
>> 
>> 
>> To push a little bit more in the direction of switching over I created a first proof of concept Python version and compared (most of) the included test between the existing Perl and my Python version. Because the Python version is not complete yet (see below) I had to disable a few passes to allow a fair comparison. And I ran only tests 0-3 and 6, 7 because 4 and 5 make use of KCC and Frama-C and I did not want to go through to much trouble setting everything up. ;-) (Running them wouldn't have been a problem, though.)
>> 
>> My detailed results can be found here: https://docs.google.com/spreadsheets/d/1FIvuHr29X2T2H2wOrnGCU0BUM3NeQrvJY_GpKMVJRCA/edit?usp=sharing
>> 
>> In short: On Linux my Python version takes only 62% of the time on average, on Windows there is not much of a difference. (This might be because the bottleneck on Windows is the process creation -- as opposed to forking on Linux -- and not the passes themselves.)
>> On Linux the Perl variant used the original shell test scripts, for the Python variant I converted the tests to equivalent Python function. In both cases each test was run as a separate process, so I guess the comparison is fair.
>> On Windows, since I could not run the shell scripts, both variants used the same Python scripts.
>> 
>> 
>> Some words about the Python version. First, it can be found here: https://github.com/mpflanzer/creduce/blob/python/creduce/creduce.py
>> - It took me about 10-20 hours to write this version -- hard to say how long exactly since I could always only work for short periods. I would estimate that it is about 70% complete with respect to the Perl version.
>> - I have written it in Python3 as it offers some convenient features over Python2 and the recommendation is to start new work with Python3 anyway.
>> - It does not use anything but the modules which come with the default Python installation (both Linux and Windows)
>> - I think the largest missing piece are the passes that remove matched parentheses, braces etc. Python has no built-in functionality so a small custom parser would have to be written -- should not be to difficult
>> - I have not yet figured out the best way to represent, load and execute the interestingness tests. Ideally I would like to have a base class from which each custom test could inherit. Each test would then be written in a separate Python script but dynamically imported into the C-Reduce script. Then it could be used as any other class. If that's not really feasible it is however no problem to just run them as independent scripts -- the same way like it is now in the Perl version.
>> 
>> 
>> I think that is all I can report for now. Please let me know what you think about the idea or if you need some more information. I might have missed something in this writeup.
>> 
>> Best regards,
>> 
>> Moritz
>>