[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xsmith-dev] XSmith questions



Do you have a roadmap for implementing ...

Not at the moment.  Right now I'm focused on getting some of our
mostly-finished fuzzers *really* finished and ready for fuzzing,
running some fuzzing campaigns to find some bugs, and some features
that are immediately helpful for running fuzzing (eg. some stuff to
support feedback-directed fuzzing and choice weighting).


Also, I'm curious about the setup of various fuzzers that you are running.
How many programs are you able to generate and test per minute? Do you have
any particular parallelization setup that works well for you?

There is a harness repository that we use that was mostly written by
Guy Watson:

https://gitlab.flux.utah.edu/xsmith/harness

There is a `harness.py` script that takes a configuration file to run
an xsmith fuzzer, running its output through various implementations
of whatever language, comparing the output, and binning the result
(eg. everything matching, diff, crash, everything crashing (IE bad
program), xsmith itself crashing, ...).  That harness script is
basically for running a single job, and there are 2 ways of doing it
in parallel.

The first one is the `go-button`, which can run fuzzing on multiple
machines at once.  It is fairly dependent on Emulab/Cloudlab
(https://www.emulab.net/) to have things set up just right, and
requires at least 2 machines.  It also generates a bunch of live
graphs about fuzzing.  It uses ansible to run setup scripts to prep
each machine before starting the fuzz campaign.

I mostly use `local-multi-harness` instead, which just runs on a
single machine, but runs multiple harness processes.  It doesn't
automatically run the setup scripts like the `go-button` does, so
eg. if you want to use the `racket-kernel-fuzzer` configuration you
need to run the `racket-install.rkt` script first manually.

As far as speed, xsmith is not particularly fast.  There is a server
mode that at least saves on startup time, but I think the harness
might not run it particularly well (Guy correct me if I'm wrong -- at
any rate I haven't been using it with server mode, but that might be
because I never bothered changing my config after bugs were fixed).
Aside from Racket's slow startup speed (particularly bad with Xsmith
because it loads a lot of stuff), the runtime varies a lot based on
the max-depth configuration, the size of the grammar, etc.

I haven't focused a lot on performance, necessarily.  Basically
whenever I decide Xsmith is too slow for my taste I spend some time
trying to optimize it, and generally make a few gains until I think
“that's enough optimization for now”.  If you can find any ways to
make improvements, that would be great!  I might have taken all of the
low-hanging fruit in my previous optimization passes, however.  In
previous optimization passes I got the most out of optimizing in the
type system, but I made some major improvements to type checking speed
last time so it may not dominate anymore.


As an aside, a question for you: Are there multiple Dafny
implementations?  Or different optimization levels?  Or are you
running different versions of the compiler against each other?



On Mon, Jul 19, 2021 at 04:15:04PM -0700, Sorawee Porncharoenwase wrote:
Also, I'm curious about the setup of various fuzzers that you are running.
How many programs are you able to generate and test per minute? Do you have
any particular parallelization setup that works well for you?

On Fri, Jul 16, 2021 at 8:32 PM Sorawee Porncharoenwase <
sorawee.pwase@gmail.com> wrote:

Do you have a roadmap for implementing "Provide mechanism for precluding
types in `fresh-type-variable` invocations"? (
https://gitlab.flux.utah.edu/xsmith/xsmith/-/issues/56). I also really
want this feature.

On Thu, Jul 15, 2021 at 9:22 PM William G Hatch <william@hatch.uno> wrote:

That would be great, thanks!

On Thu, Jul 15, 2021 at 08:09:13PM -0700, Sorawee Porncharoenwase wrote:
>Yes, this is very helpful. I will let you know if I encounter any further
>issues after updating XSmith.
>
>By the way, do you want any help with Scribble? I notice a lot of typos
and
>link errors and can submit PRs to fix them.
>
>On Thu, Jul 15, 2021 at 8:00 PM William G Hatch <william@hatch.uno>
wrote:
>
>> Sorry for the delay.  I wanted to do the debugging before responding,
>> but then I didn't get to it for a while.
>>
>> Anyway, I now see what's going wrong, and I've fixed some bugs.
>>
>> First, that issue with the bud-node was a bug in the `edit` property
>> which the canned-components `ProcedureApplication` uses.  It's fixed
>> now.
>>
>> >> Hmm, it seems like first-order functions would be encoded as a
>> >> function application node that only allows a reference as its RHS
>> >> rather than allowing arbitrary expressions.  But that said, I'm not
>> >> sure I fully understand what you're trying to do.
>> >>
>> >I’m not sure what you mean by that. But:
>>
>> Erm, I misspoke (mistyped) a bit.  I meant that it would be function
>> application that only allows a reference in the procedure position.  I
>> don't know why I wrote RHS.
>>
>> Anyway, the strategy of having separate definition nodes for functions
>> and everything else is used in Cish.  I don't recommend following
>> anything Cish does.  However, it accomplishes this by having the
>> program node have non-function definitions first, and disallowing use
>> of functions on the RHS of those functions.
>>
>> The problem where it can't find a reference is a bug (that I haven't
>> fixed yet) caused by the lifting mechanics not really being completely
>> in sync with the reference mechanics with respect to nodes that have
>> multiple definition fields.  When there are multiple definition
>> fields, definitions in the lower field can reference the definitions
>> in the higher field, but not vice versa.  The lifting machinery
>> doesn't know about that ordering, it just stuffs the lifted definition
>> into the front of the definitions in whatever field it is targetting.
>> Thus it places a definition in a place that the reference machinery
>> thinks is out of scope.
>>
>> So... if you want to have different binders that get different fields
>> in the program node you can only make it work if you disallow stuff to
>> be sure you never need to reference something that will be lifted into
>> the lower field.
>>
>> That said, I've just fixed some issues to allow definition nodes to be
>> subtypable, so now you can have a definition node that is a subtype of
>> Definition, but has some different rules.  Importantly, though, it can
>> be lifted to the same field.
>>
>> Eg. if I change the MyDefinition declaration in your example to look
>> like this:
>>
>> ```
>>   [MyDefinition Definition ()
>>                 #:prop fresh (hash 'Expression (make-hole
>> 'ProcedureApplication))]
>> ```
>>
>> It can now be lifted among normal definitions, but always have
>> ProcedureApplication as its RHS.
>>
>> However, I'm still not convinced that you actually want
>> ProcedureApplication as the RHS.  Maybe you want to restrict lambda
>> nodes to only be generated on the RHS of a FunctionDefinition node
>> (that's a subtype of Definition), and make FunctionDefinition always
>> have a Lambda node as its RHS.
>>
>> There is an example in the documentation that is like this, in the
>> explanation of the `choice-filters-to-apply` property.  But I'll paste
>> it here:
>>
>> ```
>> (add-choice-method
>>   my-component
>>   no-lambda-except-global-def
>>   [#f (λ () #t)]
>>   [Lambda (λ ()
>>             (and (parent-node current-hole)
>>                  (equal? (ast-node-type (parent-node current-hole))
>>                          'Definition)
>>                  (parent-node (parent-node current-hole))
>>                  (equal? (ast-node-type (parent-node (parent-node
>> current-hole)))
>>                          'Program)))])
>> (add-property my-component
>>            choice-filters-to-apply
>>            [#f (no-lambda-except-global-def)])
>> ```
>>
>> You can use something like that to allow Lambda nodes to only be
>> generated on the RHS of FunctionDefinition nodes, and define
>> FunctionDefinition like MyDefinition above to always generate Lambda
>> as its RHS, and I think it should work for a first-order language.
>>
>>
>> >(Also note that I tried unify! these unrestricted type variables
>> >with some concrete non-function type variables, but down the road, I
get
>> an
>> >internal error about unification failure.). Perhaps I'm not supposed
to
>> >unify! inside add-choice-method?
>>
>> Well, probably not.  Generally the only place you want to `unify!` is
>> inside the type property in the function that determines child types
>> based on the parent type.  Inside that function you can `unify!`, but
>> the unification needs to be consistent (IE not unified based on the
>> result of `random`) OR the choice needs to be saved in a way that it
>> can be looked up when that function is run again.
>>
>> The problem is that the choice method gets its type information based
>> on a hole node that is later replaced, so the type is recomputed
>> without necessarily consulting the type that the hole said it was.
>> You should be able to constrain the type using unification if you save
>> that type to re-unify it consistently later.  But without seeing the
>> exact error and code that generated it, I'm not 100% certain.
>>
>>
>> > In the last email, I asked if lift-type->ast-binder-type could be
used
>> > nondeterministically in this manner, and my experience with it so far
>> seems
>> > to indicate no.
>>
>> I think this should work, barring the issues mentioned above about the
>> lifting vs referencing bug when you have the different definition
>> types in different fields.  Adjusting your example to have
>> MyDefinition as a subtype, and then only having a Definition field in
>> MyProgram, random choice seems to work fine.  But sometimes it
>> generates gargantuan programs (that I kill before it finishes) because
>> it keeps choosing MyDefinition, requiring a function application,
>> which likely lifts a reference to a function, which might choose
>> MyDefinition, etc, in a bad lift chain that generates bigger and
>> bigger types.
>>
>>
>> Anyway, does that all help?
>>
>>
>> On Tue, Jul 13, 2021 at 02:55:10PM -0700, Sorawee Porncharoenwase
wrote:
>> >>
>> >> Hmm, it seems like first-order functions would be encoded as a
>> >> function application node that only allows a reference as its RHS
>> >> rather than allowing arbitrary expressions.  But that said, I'm not
>> >> sure I fully understand what you're trying to do.
>> >>
>> >I’m not sure what you mean by that. But:
>> >
>> >   1. I have no problem with the definition site. That’s easy to deal
>> with.
>> >   2. I also have no problem with restricting function application.
That’s
>> >   easy to deal with as well.
>> >   3. The problem is primarily with a bare variable reference bound
to a
>> >   closure value, which must only occur in function application. For
>> example,
>> >   the following code is invalid.
>> >
>> >(define (f) (void))
>> >
>> >(define g f) ;; this is invalid
>> >(define h (g))
>> >(define a (list f)) ;; this is also invalid
>> >(define b (f)) ;; this is ok
>> >
>> >I have been using choice-filters-to-apply to restrict variable
references
>> >whose type can-unify? with function-type that doesn't directly appear
>> under
>> >function application. But this is obviously not ideal because the
filtered
>> >nodes are mostly those whose type is totally unrestricted. So
filtering
>> >them would prevent them from being concretized to other non-function
types
>> >as well. (Also note that I tried unify! these unrestricted type
variables
>> >with some concrete non-function type variables, but down the road, I
get
>> an
>> >internal error about unification failure.). Perhaps I'm not supposed
to
>> >unify! inside add-choice-method?
>> >
>> >That's probably a bug.  “Lifting” is what I've called the situation
>> >> where a reference is generated that causes a new definition to be
>> >> added such that the variable is in scope at the use site.  IE the
>> >> binding is lifted from the use site to some outer node that contains
>> >> definitions.  `lift_X` is the name Xsmith gives to definitions that
>> >> are created automatically in this way (which is ultimately most
>> >> definitions).  The exception is caused because somehow the lift
ended
>> >> up in a place where it wasn't actually in scope.  That shouldn't
>> >> happen.  Anyway, if you point me to a version of a fuzzer that does
>> >> this, I'll try to debug it.
>> >>
>> >#lang clotho
>> >
>> >(require
>> > xsmith
>> > racr
>> > xsmith/racr-convenience
>> > xsmith/canned-components
>> > racket/string
>> > racket/list
>> > racket/pretty)
>> >
>> >;; We first define a basic component and add a bunch of expressions.
>> >
>> >(define-basic-spec-component somelisp)
>> >
>> >(add-basic-expressions somelisp
>> >                       #:ProgramWithSequence #t
>> >                       #:VoidExpression #t
>> >                       #:AssignmentExpression #t
>> >                       #:VariableReference #t
>> >                       #:ProcedureApplication #t
>> >                       #:IfExpression #t
>> >                       #:ExpressionSequence #t
>> >                       #:LetSequential #t
>> >                       #:LambdaWithExpression #t
>> >                       #:Numbers #t
>> >                       #:Booleans #t
>> >                       #:Strings #t
>> >                       #:ImmutableList #t)
>> >
>> >(add-loop-over-container
>> > somelisp
>> > #:name Map
>> > #:collection-type-constructor (λ (inner) (immutable (list-type
inner)))
>> > #:loop-type-constructor (λ (inner) (immutable (list-type inner))))
>> >
>> >(add-to-grammar
>> > somelisp
>> > [MyProgram #f
>> >            ([decls : Definition *]
>> >             [declsTwo : MyDefinition *]
>> >             [body : Expression])
>> >            #:prop type-info
>> >            [(fresh-type-variable)
>> >             (λ (n t)
>> >               (hash 'decls (λ (c) (fresh-type-variable))
>> >                     'declsTwo (λ (c) (fresh-type-variable))
>> >                     'body (fresh-type-variable)))]]
>> > [MyDefinition #f ([type]
>> >                   [name = (fresh-var-name "b_")]
>> >                   [e : ProcedureApplication])
>> >               #:prop binder-info ()
>> >               #:prop type-info [(fresh-type-variable) (λ (n t) (hash
'e
>> t))]])
>> >
>> >(add-property
>> > somelisp
>> > lift-type->ast-binder-type
>> > [#f (λ (type) (if (zero? (random 2)) 'MyDefinition 'Definition))])
>> >
>> >;; Sometimes we have a type variable that is not concrete but needs
to be.
>> >;; Here we provide a list of options we can pick for an unconstrained
>> >;; type variable.
>> >(define (type-thunks-for-concretization)
>> >  (list
>> >   (λ()int-type)
>> >   (λ()bool-type)
>> >   (λ()string-type)
>> >   (λ()(immutable (list-type (fresh-type-variable))))))
>> >
>> >(define (somelisp-format-render s-exps)
>> >  (define out (open-output-string))
>> >  (for ([symex s-exps])
>> >    (pretty-print symex out 1))
>> >  (get-output-string out))
>> >
>> >(define-xsmith-interface-functions
>> >  [somelisp]
>> >  #:program-node MyProgram
>> >  #:type-thunks type-thunks-for-concretization
>> >  #:comment-wrap (λ (lines)
>> >                   (string-join
>> >                    (map (λ (l) (format ";; ~a" l)) lines)
>> >                    "\n"))
>> >  #:format-render somelisp-format-render)
>> >
>> >(module+ main (somelisp-command-line))
>> >
>> >
>> >> Probably something is going wrong in the `fresh` property.  Have you
>> >> changed that?
>> >>
>> >I don’t think so. See the code at the end of the email.
>> >
>> >> I should change that error message.  Anyway, it seems to think there
>> >> is no outer node that has a `Definition *` child that can be the
>> >> target for lifting a definition.  Generally this happens when you
>> >> forget to have an outer Program node or something that can hold
>> >> definitions.
>> >>
>> >I don’t think that’s the case, but I could be wrong.
>> >
>> >#lang clotho
>> >
>> >(require
>> > xsmith
>> > racr
>> > xsmith/racr-convenience
>> > xsmith/canned-components
>> > racket/string
>> > racket/list
>> > racket/pretty)
>> >
>> >;; We first define a basic component and add a bunch of expressions.
>> >
>> >(define-basic-spec-component somelisp)
>> >
>> >(add-basic-expressions somelisp
>> >                       #:ProgramWithSequence #t
>> >                       #:VoidExpression #t
>> >                       #:AssignmentExpression #t
>> >                       #:VariableReference #t
>> >                       #:ProcedureApplication #t
>> >                       #:IfExpression #t
>> >                       #:ExpressionSequence #t
>> >                       #:LetSequential #t
>> >                       #:LambdaWithExpression #t
>> >                       #:Numbers #t
>> >                       #:Booleans #t
>> >                       #:Strings #t
>> >                       #:ImmutableList #t)
>> >
>> >(add-loop-over-container
>> > somelisp
>> > #:name Map
>> > #:collection-type-constructor (λ (inner) (immutable (list-type
inner)))
>> > #:loop-type-constructor (λ (inner) (immutable (list-type inner))))
>> >
>> >(add-to-grammar
>> > somelisp
>> > [MyProgram #f
>> >            ([decls : Definition *]
>> >             [body : SubProgram])
>> >            #:prop type-info
>> >            [(fresh-type-variable)
>> >             (λ (n t)
>> >               (hash 'decls (λ (c) (fresh-type-variable))
>> >                     'body (fresh-type-variable)))]]
>> > [SubProgram #f
>> >             ([decls : MyDefinition *]
>> >              [body : Expression])
>> >             #:prop type-info
>> >             [(fresh-type-variable)
>> >              (λ (n t)
>> >                (hash 'decls (λ (c) (fresh-type-variable))
>> >                      'body (fresh-type-variable)))]]
>> > [MyDefinition #f ([type]
>> >                   [name = (fresh-var-name "b_")]
>> >                   [e : ProcedureApplication])
>> >               #:prop binder-info ()
>> >               #:prop type-info [(fresh-type-variable) (λ (n t) (hash
'e
>> t))]])
>> >
>> >(add-property
>> > somelisp
>> > lift-type->ast-binder-type
>> > [#f (λ (type) (if (zero? (random 2)) 'MyDefinition 'Definition))])
>> >
>> >;; Sometimes we have a type variable that is not concrete but needs
to be.
>> >;; Here we provide a list of options we can pick for an unconstrained
>> >;; type variable.
>> >(define (type-thunks-for-concretization)
>> >  (list
>> >   (λ()int-type)
>> >   (λ()bool-type)
>> >   (λ()string-type)
>> >   (λ()(immutable (list-type (fresh-type-variable))))))
>> >
>> >(define (somelisp-format-render s-exps)
>> >  (define out (open-output-string))
>> >  (for ([symex s-exps])
>> >    (pretty-print symex out 1))
>> >  (get-output-string out))
>> >
>> >(define-xsmith-interface-functions
>> >  [somelisp]
>> >  #:program-node MyProgram
>> >  #:type-thunks type-thunks-for-concretization
>> >  #:comment-wrap (λ (lines)
>> >                   (string-join
>> >                    (map (λ (l) (format ";; ~a" l)) lines)
>> >                    "\n"))
>> >  #:format-render somelisp-format-render)
>> >
>> >(module+ main (somelisp-command-line))
>>