[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xsmith-dev] XSmith questions



Awesome!  We have a git repo where we are keeping track of bugs we
find with Xsmith.  If you want, we could add you to it and you can add
any bugs you find to it.  Otherwise I will probably put in anything
you mention like this.

On Tue, Jul 20, 2021 at 09:58:55AM -0700, Sorawee Porncharoenwase wrote:
By the way, here are the results so far:

- https://github.com/dafny-lang/dafny/issues/1291
- https://github.com/dafny-lang/dafny/issues/1297

On Tue, Jul 20, 2021 at 9:57 AM Sorawee Porncharoenwase <
sorawee.pwase@gmail.com> wrote:

We are primarily focused on different compiler backends for Dafny (Dafny
can be compiled to Java, C#, Rust, JS, Go, C++, and perhaps more).

On Tue, Jul 20, 2021 at 9:34 AM William G Hatch <william@hatch.uno> wrote:

>> Do you have a roadmap for implementing ...

Not at the moment.  Right now I'm focused on getting some of our
mostly-finished fuzzers *really* finished and ready for fuzzing,
running some fuzzing campaigns to find some bugs, and some features
that are immediately helpful for running fuzzing (eg. some stuff to
support feedback-directed fuzzing and choice weighting).


>Also, I'm curious about the setup of various fuzzers that you are
running.
>How many programs are you able to generate and test per minute? Do you
have
>any particular parallelization setup that works well for you?

There is a harness repository that we use that was mostly written by
Guy Watson:

https://gitlab.flux.utah.edu/xsmith/harness

There is a `harness.py` script that takes a configuration file to run
an xsmith fuzzer, running its output through various implementations
of whatever language, comparing the output, and binning the result
(eg. everything matching, diff, crash, everything crashing (IE bad
program), xsmith itself crashing, ...).  That harness script is
basically for running a single job, and there are 2 ways of doing it
in parallel.

The first one is the `go-button`, which can run fuzzing on multiple
machines at once.  It is fairly dependent on Emulab/Cloudlab
(https://www.emulab.net/) to have things set up just right, and
requires at least 2 machines.  It also generates a bunch of live
graphs about fuzzing.  It uses ansible to run setup scripts to prep
each machine before starting the fuzz campaign.

I mostly use `local-multi-harness` instead, which just runs on a
single machine, but runs multiple harness processes.  It doesn't
automatically run the setup scripts like the `go-button` does, so
eg. if you want to use the `racket-kernel-fuzzer` configuration you
need to run the `racket-install.rkt` script first manually.

As far as speed, xsmith is not particularly fast.  There is a server
mode that at least saves on startup time, but I think the harness
might not run it particularly well (Guy correct me if I'm wrong -- at
any rate I haven't been using it with server mode, but that might be
because I never bothered changing my config after bugs were fixed).
Aside from Racket's slow startup speed (particularly bad with Xsmith
because it loads a lot of stuff), the runtime varies a lot based on
the max-depth configuration, the size of the grammar, etc.

I haven't focused a lot on performance, necessarily.  Basically
whenever I decide Xsmith is too slow for my taste I spend some time
trying to optimize it, and generally make a few gains until I think
“that's enough optimization for now”.  If you can find any ways to
make improvements, that would be great!  I might have taken all of the
low-hanging fruit in my previous optimization passes, however.  In
previous optimization passes I got the most out of optimizing in the
type system, but I made some major improvements to type checking speed
last time so it may not dominate anymore.


As an aside, a question for you: Are there multiple Dafny
implementations?  Or different optimization levels?  Or are you
running different versions of the compiler against each other?



On Mon, Jul 19, 2021 at 04:15:04PM -0700, Sorawee Porncharoenwase wrote:
>Also, I'm curious about the setup of various fuzzers that you are
running.
>How many programs are you able to generate and test per minute? Do you
have
>any particular parallelization setup that works well for you?
>
>On Fri, Jul 16, 2021 at 8:32 PM Sorawee Porncharoenwase <
>sorawee.pwase@gmail.com> wrote:
>
>> Do you have a roadmap for implementing "Provide mechanism for
precluding
>> types in `fresh-type-variable` invocations"? (
>> https://gitlab.flux.utah.edu/xsmith/xsmith/-/issues/56). I also really
>> want this feature.
>>
>> On Thu, Jul 15, 2021 at 9:22 PM William G Hatch <william@hatch.uno>
wrote:
>>
>>> That would be great, thanks!
>>>
>>> On Thu, Jul 15, 2021 at 08:09:13PM -0700, Sorawee Porncharoenwase
wrote:
>>> >Yes, this is very helpful. I will let you know if I encounter any
further
>>> >issues after updating XSmith.
>>> >
>>> >By the way, do you want any help with Scribble? I notice a lot of
typos
>>> and
>>> >link errors and can submit PRs to fix them.
>>> >
>>> >On Thu, Jul 15, 2021 at 8:00 PM William G Hatch <william@hatch.uno>
>>> wrote:
>>> >
>>> >> Sorry for the delay.  I wanted to do the debugging before
responding,
>>> >> but then I didn't get to it for a while.
>>> >>
>>> >> Anyway, I now see what's going wrong, and I've fixed some bugs.
>>> >>
>>> >> First, that issue with the bud-node was a bug in the `edit`
property
>>> >> which the canned-components `ProcedureApplication` uses.  It's
fixed
>>> >> now.
>>> >>
>>> >> >> Hmm, it seems like first-order functions would be encoded as a
>>> >> >> function application node that only allows a reference as its
RHS
>>> >> >> rather than allowing arbitrary expressions.  But that said, I'm
not
>>> >> >> sure I fully understand what you're trying to do.
>>> >> >>
>>> >> >I’m not sure what you mean by that. But:
>>> >>
>>> >> Erm, I misspoke (mistyped) a bit.  I meant that it would be
function
>>> >> application that only allows a reference in the procedure
position.  I
>>> >> don't know why I wrote RHS.
>>> >>
>>> >> Anyway, the strategy of having separate definition nodes for
functions
>>> >> and everything else is used in Cish.  I don't recommend following
>>> >> anything Cish does.  However, it accomplishes this by having the
>>> >> program node have non-function definitions first, and disallowing
use
>>> >> of functions on the RHS of those functions.
>>> >>
>>> >> The problem where it can't find a reference is a bug (that I
haven't
>>> >> fixed yet) caused by the lifting mechanics not really being
completely
>>> >> in sync with the reference mechanics with respect to nodes that
have
>>> >> multiple definition fields.  When there are multiple definition
>>> >> fields, definitions in the lower field can reference the
definitions
>>> >> in the higher field, but not vice versa.  The lifting machinery
>>> >> doesn't know about that ordering, it just stuffs the lifted
definition
>>> >> into the front of the definitions in whatever field it is
targetting.
>>> >> Thus it places a definition in a place that the reference machinery
>>> >> thinks is out of scope.
>>> >>
>>> >> So... if you want to have different binders that get different
fields
>>> >> in the program node you can only make it work if you disallow
stuff to
>>> >> be sure you never need to reference something that will be lifted
into
>>> >> the lower field.
>>> >>
>>> >> That said, I've just fixed some issues to allow definition nodes
to be
>>> >> subtypable, so now you can have a definition node that is a
subtype of
>>> >> Definition, but has some different rules.  Importantly, though, it
can
>>> >> be lifted to the same field.
>>> >>
>>> >> Eg. if I change the MyDefinition declaration in your example to
look
>>> >> like this:
>>> >>
>>> >> ```
>>> >>   [MyDefinition Definition ()
>>> >>                 #:prop fresh (hash 'Expression (make-hole
>>> >> 'ProcedureApplication))]
>>> >> ```
>>> >>
>>> >> It can now be lifted among normal definitions, but always have
>>> >> ProcedureApplication as its RHS.
>>> >>
>>> >> However, I'm still not convinced that you actually want
>>> >> ProcedureApplication as the RHS.  Maybe you want to restrict lambda
>>> >> nodes to only be generated on the RHS of a FunctionDefinition node
>>> >> (that's a subtype of Definition), and make FunctionDefinition
always
>>> >> have a Lambda node as its RHS.
>>> >>
>>> >> There is an example in the documentation that is like this, in the
>>> >> explanation of the `choice-filters-to-apply` property.  But I'll
paste
>>> >> it here:
>>> >>
>>> >> ```
>>> >> (add-choice-method
>>> >>   my-component
>>> >>   no-lambda-except-global-def
>>> >>   [#f (λ () #t)]
>>> >>   [Lambda (λ ()
>>> >>             (and (parent-node current-hole)
>>> >>                  (equal? (ast-node-type (parent-node current-hole))
>>> >>                          'Definition)
>>> >>                  (parent-node (parent-node current-hole))
>>> >>                  (equal? (ast-node-type (parent-node (parent-node
>>> >> current-hole)))
>>> >>                          'Program)))])
>>> >> (add-property my-component
>>> >>            choice-filters-to-apply
>>> >>            [#f (no-lambda-except-global-def)])
>>> >> ```
>>> >>
>>> >> You can use something like that to allow Lambda nodes to only be
>>> >> generated on the RHS of FunctionDefinition nodes, and define
>>> >> FunctionDefinition like MyDefinition above to always generate
Lambda
>>> >> as its RHS, and I think it should work for a first-order language.
>>> >>
>>> >>
>>> >> >(Also note that I tried unify! these unrestricted type variables
>>> >> >with some concrete non-function type variables, but down the
road, I
>>> get
>>> >> an
>>> >> >internal error about unification failure.). Perhaps I'm not
supposed
>>> to
>>> >> >unify! inside add-choice-method?
>>> >>
>>> >> Well, probably not.  Generally the only place you want to `unify!`
is
>>> >> inside the type property in the function that determines child
types
>>> >> based on the parent type.  Inside that function you can `unify!`,
but
>>> >> the unification needs to be consistent (IE not unified based on the
>>> >> result of `random`) OR the choice needs to be saved in a way that
it
>>> >> can be looked up when that function is run again.
>>> >>
>>> >> The problem is that the choice method gets its type information
based
>>> >> on a hole node that is later replaced, so the type is recomputed
>>> >> without necessarily consulting the type that the hole said it was.
>>> >> You should be able to constrain the type using unification if you
save
>>> >> that type to re-unify it consistently later.  But without seeing
the
>>> >> exact error and code that generated it, I'm not 100% certain.
>>> >>
>>> >>
>>> >> > In the last email, I asked if lift-type->ast-binder-type could be
>>> used
>>> >> > nondeterministically in this manner, and my experience with it
so far
>>> >> seems
>>> >> > to indicate no.
>>> >>
>>> >> I think this should work, barring the issues mentioned above about
the
>>> >> lifting vs referencing bug when you have the different definition
>>> >> types in different fields.  Adjusting your example to have
>>> >> MyDefinition as a subtype, and then only having a Definition field
in
>>> >> MyProgram, random choice seems to work fine.  But sometimes it
>>> >> generates gargantuan programs (that I kill before it finishes)
because
>>> >> it keeps choosing MyDefinition, requiring a function application,
>>> >> which likely lifts a reference to a function, which might choose
>>> >> MyDefinition, etc, in a bad lift chain that generates bigger and
>>> >> bigger types.
>>> >>
>>> >>
>>> >> Anyway, does that all help?
>>> >>
>>> >>
>>> >> On Tue, Jul 13, 2021 at 02:55:10PM -0700, Sorawee Porncharoenwase
>>> wrote:
>>> >> >>
>>> >> >> Hmm, it seems like first-order functions would be encoded as a
>>> >> >> function application node that only allows a reference as its
RHS
>>> >> >> rather than allowing arbitrary expressions.  But that said, I'm
not
>>> >> >> sure I fully understand what you're trying to do.
>>> >> >>
>>> >> >I’m not sure what you mean by that. But:
>>> >> >
>>> >> >   1. I have no problem with the definition site. That’s easy to
deal
>>> >> with.
>>> >> >   2. I also have no problem with restricting function
application.
>>> That’s
>>> >> >   easy to deal with as well.
>>> >> >   3. The problem is primarily with a bare variable reference
bound
>>> to a
>>> >> >   closure value, which must only occur in function application.
For
>>> >> example,
>>> >> >   the following code is invalid.
>>> >> >
>>> >> >(define (f) (void))
>>> >> >
>>> >> >(define g f) ;; this is invalid
>>> >> >(define h (g))
>>> >> >(define a (list f)) ;; this is also invalid
>>> >> >(define b (f)) ;; this is ok
>>> >> >
>>> >> >I have been using choice-filters-to-apply to restrict variable
>>> references
>>> >> >whose type can-unify? with function-type that doesn't directly
appear
>>> >> under
>>> >> >function application. But this is obviously not ideal because the
>>> filtered
>>> >> >nodes are mostly those whose type is totally unrestricted. So
>>> filtering
>>> >> >them would prevent them from being concretized to other
non-function
>>> types
>>> >> >as well. (Also note that I tried unify! these unrestricted type
>>> variables
>>> >> >with some concrete non-function type variables, but down the
road, I
>>> get
>>> >> an
>>> >> >internal error about unification failure.). Perhaps I'm not
supposed
>>> to
>>> >> >unify! inside add-choice-method?
>>> >> >
>>> >> >That's probably a bug.  “Lifting” is what I've called the
situation
>>> >> >> where a reference is generated that causes a new definition to
be
>>> >> >> added such that the variable is in scope at the use site.  IE
the
>>> >> >> binding is lifted from the use site to some outer node that
contains
>>> >> >> definitions.  `lift_X` is the name Xsmith gives to definitions
that
>>> >> >> are created automatically in this way (which is ultimately most
>>> >> >> definitions).  The exception is caused because somehow the lift
>>> ended
>>> >> >> up in a place where it wasn't actually in scope.  That shouldn't
>>> >> >> happen.  Anyway, if you point me to a version of a fuzzer that
does
>>> >> >> this, I'll try to debug it.
>>> >> >>
>>> >> >#lang clotho
>>> >> >
>>> >> >(require
>>> >> > xsmith
>>> >> > racr
>>> >> > xsmith/racr-convenience
>>> >> > xsmith/canned-components
>>> >> > racket/string
>>> >> > racket/list
>>> >> > racket/pretty)
>>> >> >
>>> >> >;; We first define a basic component and add a bunch of
expressions.
>>> >> >
>>> >> >(define-basic-spec-component somelisp)
>>> >> >
>>> >> >(add-basic-expressions somelisp
>>> >> >                       #:ProgramWithSequence #t
>>> >> >                       #:VoidExpression #t
>>> >> >                       #:AssignmentExpression #t
>>> >> >                       #:VariableReference #t
>>> >> >                       #:ProcedureApplication #t
>>> >> >                       #:IfExpression #t
>>> >> >                       #:ExpressionSequence #t
>>> >> >                       #:LetSequential #t
>>> >> >                       #:LambdaWithExpression #t
>>> >> >                       #:Numbers #t
>>> >> >                       #:Booleans #t
>>> >> >                       #:Strings #t
>>> >> >                       #:ImmutableList #t)
>>> >> >
>>> >> >(add-loop-over-container
>>> >> > somelisp
>>> >> > #:name Map
>>> >> > #:collection-type-constructor (λ (inner) (immutable (list-type
>>> inner)))
>>> >> > #:loop-type-constructor (λ (inner) (immutable (list-type
inner))))
>>> >> >
>>> >> >(add-to-grammar
>>> >> > somelisp
>>> >> > [MyProgram #f
>>> >> >            ([decls : Definition *]
>>> >> >             [declsTwo : MyDefinition *]
>>> >> >             [body : Expression])
>>> >> >            #:prop type-info
>>> >> >            [(fresh-type-variable)
>>> >> >             (λ (n t)
>>> >> >               (hash 'decls (λ (c) (fresh-type-variable))
>>> >> >                     'declsTwo (λ (c) (fresh-type-variable))
>>> >> >                     'body (fresh-type-variable)))]]
>>> >> > [MyDefinition #f ([type]
>>> >> >                   [name = (fresh-var-name "b_")]
>>> >> >                   [e : ProcedureApplication])
>>> >> >               #:prop binder-info ()
>>> >> >               #:prop type-info [(fresh-type-variable) (λ (n t)
(hash
>>> 'e
>>> >> t))]])
>>> >> >
>>> >> >(add-property
>>> >> > somelisp
>>> >> > lift-type->ast-binder-type
>>> >> > [#f (λ (type) (if (zero? (random 2)) 'MyDefinition
'Definition))])
>>> >> >
>>> >> >;; Sometimes we have a type variable that is not concrete but
needs
>>> to be.
>>> >> >;; Here we provide a list of options we can pick for an
unconstrained
>>> >> >;; type variable.
>>> >> >(define (type-thunks-for-concretization)
>>> >> >  (list
>>> >> >   (λ()int-type)
>>> >> >   (λ()bool-type)
>>> >> >   (λ()string-type)
>>> >> >   (λ()(immutable (list-type (fresh-type-variable))))))
>>> >> >
>>> >> >(define (somelisp-format-render s-exps)
>>> >> >  (define out (open-output-string))
>>> >> >  (for ([symex s-exps])
>>> >> >    (pretty-print symex out 1))
>>> >> >  (get-output-string out))
>>> >> >
>>> >> >(define-xsmith-interface-functions
>>> >> >  [somelisp]
>>> >> >  #:program-node MyProgram
>>> >> >  #:type-thunks type-thunks-for-concretization
>>> >> >  #:comment-wrap (λ (lines)
>>> >> >                   (string-join
>>> >> >                    (map (λ (l) (format ";; ~a" l)) lines)
>>> >> >                    "\n"))
>>> >> >  #:format-render somelisp-format-render)
>>> >> >
>>> >> >(module+ main (somelisp-command-line))
>>> >> >
>>> >> >
>>> >> >> Probably something is going wrong in the `fresh` property.
Have you
>>> >> >> changed that?
>>> >> >>
>>> >> >I don’t think so. See the code at the end of the email.
>>> >> >
>>> >> >> I should change that error message.  Anyway, it seems to think
there
>>> >> >> is no outer node that has a `Definition *` child that can be the
>>> >> >> target for lifting a definition.  Generally this happens when
you
>>> >> >> forget to have an outer Program node or something that can hold
>>> >> >> definitions.
>>> >> >>
>>> >> >I don’t think that’s the case, but I could be wrong.
>>> >> >
>>> >> >#lang clotho
>>> >> >
>>> >> >(require
>>> >> > xsmith
>>> >> > racr
>>> >> > xsmith/racr-convenience
>>> >> > xsmith/canned-components
>>> >> > racket/string
>>> >> > racket/list
>>> >> > racket/pretty)
>>> >> >
>>> >> >;; We first define a basic component and add a bunch of
expressions.
>>> >> >
>>> >> >(define-basic-spec-component somelisp)
>>> >> >
>>> >> >(add-basic-expressions somelisp
>>> >> >                       #:ProgramWithSequence #t
>>> >> >                       #:VoidExpression #t
>>> >> >                       #:AssignmentExpression #t
>>> >> >                       #:VariableReference #t
>>> >> >                       #:ProcedureApplication #t
>>> >> >                       #:IfExpression #t
>>> >> >                       #:ExpressionSequence #t
>>> >> >                       #:LetSequential #t
>>> >> >                       #:LambdaWithExpression #t
>>> >> >                       #:Numbers #t
>>> >> >                       #:Booleans #t
>>> >> >                       #:Strings #t
>>> >> >                       #:ImmutableList #t)
>>> >> >
>>> >> >(add-loop-over-container
>>> >> > somelisp
>>> >> > #:name Map
>>> >> > #:collection-type-constructor (λ (inner) (immutable (list-type
>>> inner)))
>>> >> > #:loop-type-constructor (λ (inner) (immutable (list-type
inner))))
>>> >> >
>>> >> >(add-to-grammar
>>> >> > somelisp
>>> >> > [MyProgram #f
>>> >> >            ([decls : Definition *]
>>> >> >             [body : SubProgram])
>>> >> >            #:prop type-info
>>> >> >            [(fresh-type-variable)
>>> >> >             (λ (n t)
>>> >> >               (hash 'decls (λ (c) (fresh-type-variable))
>>> >> >                     'body (fresh-type-variable)))]]
>>> >> > [SubProgram #f
>>> >> >             ([decls : MyDefinition *]
>>> >> >              [body : Expression])
>>> >> >             #:prop type-info
>>> >> >             [(fresh-type-variable)
>>> >> >              (λ (n t)
>>> >> >                (hash 'decls (λ (c) (fresh-type-variable))
>>> >> >                      'body (fresh-type-variable)))]]
>>> >> > [MyDefinition #f ([type]
>>> >> >                   [name = (fresh-var-name "b_")]
>>> >> >                   [e : ProcedureApplication])
>>> >> >               #:prop binder-info ()
>>> >> >               #:prop type-info [(fresh-type-variable) (λ (n t)
(hash
>>> 'e
>>> >> t))]])
>>> >> >
>>> >> >(add-property
>>> >> > somelisp
>>> >> > lift-type->ast-binder-type
>>> >> > [#f (λ (type) (if (zero? (random 2)) 'MyDefinition
'Definition))])
>>> >> >
>>> >> >;; Sometimes we have a type variable that is not concrete but
needs
>>> to be.
>>> >> >;; Here we provide a list of options we can pick for an
unconstrained
>>> >> >;; type variable.
>>> >> >(define (type-thunks-for-concretization)
>>> >> >  (list
>>> >> >   (λ()int-type)
>>> >> >   (λ()bool-type)
>>> >> >   (λ()string-type)
>>> >> >   (λ()(immutable (list-type (fresh-type-variable))))))
>>> >> >
>>> >> >(define (somelisp-format-render s-exps)
>>> >> >  (define out (open-output-string))
>>> >> >  (for ([symex s-exps])
>>> >> >    (pretty-print symex out 1))
>>> >> >  (get-output-string out))
>>> >> >
>>> >> >(define-xsmith-interface-functions
>>> >> >  [somelisp]
>>> >> >  #:program-node MyProgram
>>> >> >  #:type-thunks type-thunks-for-concretization
>>> >> >  #:comment-wrap (λ (lines)
>>> >> >                   (string-join
>>> >> >                    (map (λ (l) (format ";; ~a" l)) lines)
>>> >> >                    "\n"))
>>> >> >  #:format-render somelisp-format-render)
>>> >> >
>>> >> >(module+ main (somelisp-command-line))
>>> >>
>>>
>>