[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [csmith-dev] Random configuration file for csmith



Haihao,

The configuration file stuff has never been used or tested very much either. Right now it's not a priority for us, but we will definitely try to fix any Csmith bugs (besides hangs or long runtimes) exposed by interesting choices of probability parameters.

The tension between generating code that is more or less like human-written code is interesting. Using Derek's data about real C code would be a fun project. I don't know that any of us at Utah have time to look into this for now, though.

John




On 05/12/2011 05:21 AM, haihao shen wrote:
I see.

Thanks a lot!

Haihao

On Thu, May 12, 2011 at 7:04 PM, Yang Chen <chenyang@cs.utah.edu
<mailto:chenyang@cs.utah.edu>> wrote:

    On 5/12/11 4:47 AM, haihao shen wrote:
    You mean the items should be sorted from small to large and then
    measure the prob via latter minus former, right?

    The output probabilities are not sorted, but the Csmith will treat
    them in order. For example, Csmith internally converts

    [statement_prob,statement_assign_prob=100,statement_block_prob=0,statement_for_prob=30,statement_ifelse_prob=15,
    statement_return_prob=35,statement_continue_prob=40,statement_break_prob=45,statement_goto_prob=50,
    statement_arrayop_prob=60]

    to

    [statement_prob, statement_block_prob=0, statement_ifelse_prob=15,
    statement_for_prob=30,

      statement_return_prob=35,statement_continue_prob=40,statement_break_prob=45,
      statement_goto_prob=50,statement_arrayop_prob=60,statement_assign_prob=100]

    As I said, we might need to generate the sorted probabilities as
    well, to avoid confusing.


    Is there any underlying assumption to measure the prob?

    Somehow we choose those probabilities heuristically. We don't have
    clear idea about the relation between probability distributions and
    the bug finding power of Csmith.

    Why minus?

    It's just a design choice. We set 0 as "don't generate it". For
    example,

    statement_block_prob=0,

    Then, say we need to choose a value to represent the probability to
    generate if-else statements. But how can we represent it? You can
    read the graph below:

    [0-14]   [15-29]  [30-34]
    block    if-else    for

    ...


    I have a simple idea, just for each item randomly choosing from 0
    to 100. Is it OK?

    You can do that. But if you really want some random probability
    distributions, you can try --random-random option, which
    automatically gives you random probabilities.

    - Yang




    Thanks,
    Haihao

    On Thu, May 12, 2011 at 6:43 PM, Yang Chen <chenyang@cs.utah.edu
    <mailto:chenyang@cs.utah.edu>> wrote:

        On 5/12/11 4:34 AM, haihao shen wrote:
        [Haihao] Does "statement_assign_prob=100" mean the total prob
        for statement_prob is 100%? Is always the latter one minus
        the former? How about
        "statement_for_prob=30,statement_ifelse_prob=15"? Please
        explain more ;)

        I am sorry I missed it.

        The above example means that we have (100-60)% chance to get
        assignment statements, (15-0)% to have if-else statements and
        (30-15)% to for statements. If the probability value for a
        statement type is 0, Csmith won't generate that kind of
        statements, for example, "statement_block_prob = 0" means that
        Csmith is not going to produce "standalone" block statements
        (besides those block statements belonging to if-else and for).

        - Yang



        On Thu, May 12, 2011 at 6:33 PM, Yang Chen
        <chenyang@cs.utah.edu <mailto:chenyang@cs.utah.edu>> wrote:

            On 5/12/11 4:29 AM, haihao shen wrote:


                However, in the latest generated sample configure
                file by csmith, statement_assign_prob=100,
                statement_block_prob=0. Does it mean only -100%
                chance to get statement_block_prob?

                No. The probabilities are not sorted in the output
                (Maybe we should do that). For example, Here is a
                sample output:

                [statement_prob,statement_assign_prob=100,statement_block_prob=0,statement_for_prob=30,statement_ifelse_prob=15,
                statement_return_prob=35,statement_continue_prob=40,statement_break_prob=45,statement_goto_prob=50,
                statement_arrayop_prob=60]

                It means that we have (15-0)% = 15% change to
                generate if-else statements, (35-15)% = 20% chance
                to generate return statements, etc.


            [Haihao]  Next is 40-35? 45-40? 50-45?

            Yes :)

            (40-35)% for continue statements, (45-40)% for break
            statements and (50-45)% for goto statements.

            - Yang