[xsmith-dev] XSmith questions

Sun Jul 11 15:42:49 MDT 2021

Hi everyone!

I have a couple more questions:

1) Is there a support for multiple bindings, just like define-values in
Scheme?
2) I’m adding first-order functions whose call is limited in some context.
The special call is approximated by how MyDefinition only allows
ProcedureApplication as the RHS in the following code, which is added to
the somelisp example:

(add-to-grammar
 somelisp
 [MyProgram #f
            ([decls : Definition *]
             [body : SubProgram])
            #:prop type-info
            [(fresh-type-variable)
             (λ (n t)
               (hash 'decls (λ (c) (fresh-type-variable))
                     'body (fresh-type-variable)))]]
 [SubProgram #f
             ([decls : MyDefinition *]
              [body : Expression])
             #:prop type-info
             [(fresh-type-variable)
              (λ (n t)
                (hash 'decls (λ (c) (fresh-type-variable))
                      'body (fresh-type-variable)))]]
 [MyDefinition #f ([type]
                   [name = (fresh-var-name "b_")]
                   [e : ProcedureApplication])
               #:prop binder-info ()
               #:prop type-info [(fresh-type-variable) (λ (n t) (hash 'e t))]])

(add-property
 somelisp
 lift-type->ast-binder-type
 [#f (λ (type) (if (zero? (random 2)) 'MyDefinition 'Definition))])

with MyProgram as the top-level node.

(I structured the grammar like that so that the RHS of MyDefinition can be
lifted to Definition. I previously tried lumping Definition and MyDefinition
together in MyProgram and got an error _xsmith_resolve-reference: Unbound
reference: lift_2 — not sure what it means).

This fails in two ways:

   - Sometimes I get an error: RACR exception: "AG evaluator exception;"
   "Cannot access" 'xsmith_no-holes-in-subtree? "attribute - the given node is
   a bud.". Indeed, when I use --s-exp-on-error, I can see that it’s a bud
   node. But this is weird because if I specify Expression, a
   ProcedureApplication can be generated just fine.
   - Sometimes I get an error: xsmith: internal error -- no destinations
   for lift from: XsmithAstHoleExpression,.

In the last email, I asked if lift-type->ast-binder-type could be used
nondeterministically in this manner, and my experience with it so far seems
to indicate no. Perhaps a better example to illustrate this point is to
first adjust MyDefinition to allow an arbitrary Expression as the RHS.
Deterministically returning either MyDefinition or Definition will
successfully generate the program, but that’s not what I want — I want both
to be generated. However, randomizing will result in the above error. Does
that mean lift-type->ast-binder-type is not the right tool for me (though I
am required to specify it, as there are multiple binder-info…)?

Thanks,
Sorawee

On Wed, Jun 23, 2021 at 3:27 PM William G Hatch <william at hatch.uno> wrote:

> On Wed, Jun 23, 2021 at 01:36:08PM -0700, Sorawee Porncharoenwase wrote:
> >Thanks for your help!
> >
> >>Every node in the AST has type information.
> >>
> >Sorry, I should have framed the question better.
> >
> >Every node has type information, but IIUC it is annotated externally,
> >right? What I’m curious about is why Definition requires an explicit type
> >field in the AST node? Can’t the type information be stored externally as
> >an attribute? And my question was that if there’s a reason this can’t be
> >done, are there any other similar situations too? (That is, it will
> require
> >an explicit type field in the AST node).
>
> TL;DR: Storing types on other nodes is generally not necessary.  It's
> done for definitions because definitions/references is where code
> relationships go from being a tree to being a more complicated graph,
> while other nodes don't have that problem.
>
> One reason we store a type in a definition node is because most
> definitions are made by lifting, and thus have a specific type needed
> before creating the right-hand-side of the definition.  If we didn't
> store the type in that case, we would need to search for variable
> use-sites when computing a type later.  Also, for variables that can
> be mutated (or that hold containers), it can be important to have the
> precise type for creating assignment nodes.  Eg. in the face of
> subtyping, assignment needs to have an invariant type relationship.
> And it's just easier to go about that stuff if the exact type is
> always explicitly stored rather than needing to compute it.  If we
> didn't store the type, it could remain only partially constrained for
> a long time, and potentially be used for mutation in multiple places,
> and basically it would just make the type analysis a lot more
> complicated.  Another reason is for the RACR cache.  Computing a
> definition's type would require looking at its references, which would
> make the dependency graph on the type attribute a lot more connected
> and tangled, meaning the caches would be flushed a lot more frequently
> and require more recomputation.
>
> Generally, when you are making your own node types, you need to store
> type information only when you make a type decision that's not fully
> determined by the node's parent and child types.  Eg. there may be
> cases where legally you can have either of type A or B, but you need
> or want to make a decision that requires some data beyond what the
> children have or before generating children.  I think this came up in
> a fuzzer that had type annotations in the printer (or maybe it was
> that there were two different functions/operations but we encoded them
> as one node where the printing just depended on the type...  It's in
> the WASM fuzzer, which I haven't had as much part in so I don't
> remember the details), but where some types at print time were still
> not constrained to a single printable type.  We had to choose one to
> print, and then make that choice consistent.  But I think at that
> point it wasn't actually necessary to store the choice in the node,
> because printing is the last thing that's ever done to the AST and
> there just wasn't another opportunity to make another choice.  But if
> you do need to make a choice about types that won't be consistently
> computed in the future by the normal type checking apparatus, then you
> need to store the type choice and unify with it.  I think it shouldn't
> be necessary.  None of my fuzzers store type fields in anything but
> name binding nodes.
>
> >>The key to using Definitions in a statement language is basically a
> >> >`Block`-like statement that can have a series of statements that
> >> >include definitions needs to have a list of definitions and a list of
> >> >statements separately.  This is a bit of a limitation, since many
> >> >languages can interleave definitions and statements, and Xsmith
> >> >basically can't.  At any rate, the `Block` node from canned-components
> >> >does this.
> >>
> >Ah, I see. Perhaps one can hack by adding another subtype of Block node
> >that, when rendering, will splice its body out. That would give an
> >appearance of interleaving statements and declarations, though I think
> >there could be a problem with variable scope -- if there's a variable
> >shadowing in a block, splicing the body out could create a redefinition,
> >which might be invalid in some languages.
>
> Yes, exactly.  That said, in practice xsmith just chooses a fresh name
> for every variable, so unless/until we change that to actually produce
> duplicate variable names, it will never be an issue.
>
> >>I hope you find Xsmith useful!  What are you intending to fuzz?  I'm
> >> >also happy to look at any code you have on Github or such for some
> >> >debugging help.
> >>
> >We are fuzzing Dafny <https://github.com/dafny-lang/dafny> compilers.
>
> Cool!  Let me know if you put your fuzzer on github or something.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </listarchives/xsmith-dev/attachments/20210711/9288c120/attachment.html>