Re: [xsmith-dev] XSmith questions

Hi everyone!

I have a couple more questions:

1) Is there a support for multiple bindings, just like define-values in Scheme?
2) I’m adding first-order functions whose call is limited in some context. The special call is approximated by how MyDefinition only allows ProcedureApplication as the RHS in the following code, which is added to the somelisp example:

(add-to-grammar
 somelisp
 [MyProgram #f
            ([decls : Definition *]
             [body : SubProgram])
            #:prop type-info
            [(fresh-type-variable)
             (λ (n t)
               (hash 'decls (λ (c) (fresh-type-variable))
                     'body (fresh-type-variable)))]]
 [SubProgram #f
             ([decls : MyDefinition *]
              [body : _expression_])
             #:prop type-info
             [(fresh-type-variable)
              (λ (n t)
                (hash 'decls (λ (c) (fresh-type-variable))
                      'body (fresh-type-variable)))]]
 [MyDefinition #f ([type]
                   [name = (fresh-var-name "b_")]
                   [e : ProcedureApplication])
               #:prop binder-info ()
               #:prop type-info [(fresh-type-variable) (λ (n t) (hash 'e t))]])

(add-property
 somelisp
 lift-type->ast-binder-type
 [#f (λ (type) (if (zero? (random 2)) 'MyDefinition 'Definition))])

with MyProgram as the top-level node.

(I structured the grammar like that so that the RHS of MyDefinition can be lifted to Definition. I previously tried lumping Definition and MyDefinition together in MyProgram and got an error _xsmith_resolve-reference: Unbound reference: lift_2 — not sure what it means).

This fails in two ways:

Sometimes I get an error: RACR exception: "AG evaluator exception;" "Cannot access" 'xsmith_no-holes-in-subtree? "attribute - the given node is a bud.". Indeed, when I use --s-exp-on-error, I can see that it’s a bud node. But this is weird because if I specify _expression_, a ProcedureApplication can be generated just fine.
Sometimes I get an error: xsmith: internal error -- no destinations for lift from: XsmithAstHoleExpression,.

In the last email, I asked if lift-type->ast-binder-type could be used nondeterministically in this manner, and my experience with it so far seems to indicate no. Perhaps a better example to illustrate this point is to first adjust MyDefinition to allow an arbitrary _expression_ as the RHS. Deterministically returning either MyDefinition or Definition will successfully generate the program, but that’s not what I want — I want both to be generated. However, randomizing will result in the above error. Does that mean lift-type->ast-binder-type is not the right tool for me (though I am required to specify it, as there are multiple binder-info…)?

Thanks,
Sorawee

On Wed, Jun 23, 2021 at 3:27 PM William G Hatch <william@hatch.uno> wrote:

On Wed, Jun 23, 2021 at 01:36:08PM -0700, Sorawee Porncharoenwase wrote:
>Thanks for your help!
>
>>Every node in the AST has type information.
>>
>Sorry, I should have framed the question better.
>
>Every node has type information, but IIUC it is annotated externally,
>right? What I’m curious about is why Definition requires an explicit type
>field in the AST node? Can’t the type information be stored externally as
>an attribute? And my question was that if there’s a reason this can’t be
>done, are there any other similar situations too? (That is, it will require
>an explicit type field in the AST node).

TL;DR: Storing types on other nodes is generally not necessary. It's
done for definitions because definitions/references is where code
relationships go from being a tree to being a more complicated graph,
while other nodes don't have that problem.

One reason we store a type in a definition node is because most
definitions are made by lifting, and thus have a specific type needed
before creating the right-hand-side of the definition. If we didn't
store the type in that case, we would need to search for variable
use-sites when computing a type later. Also, for variables that can
be mutated (or that hold containers), it can be important to have the
precise type for creating assignment nodes. Eg. in the face of
subtyping, assignment needs to have an invariant type relationship.
And it's just easier to go about that stuff if the exact type is
always explicitly stored rather than needing to compute it. If we
didn't store the type, it could remain only partially constrained for
a long time, and potentially be used for mutation in multiple places,
and basically it would just make the type analysis a lot more
complicated. Another reason is for the RACR cache. Computing a
definition's type would require looking at its references, which would
make the dependency graph on the type attribute a lot more connected
and tangled, meaning the caches would be flushed a lot more frequently
and require more recomputation.

Generally, when you are making your own node types, you need to store
type information only when you make a type decision that's not fully
determined by the node's parent and child types. Eg. there may be
cases where legally you can have either of type A or B, but you need
or want to make a decision that requires some data beyond what the
children have or before generating children. I think this came up in
a fuzzer that had type annotations in the printer (or maybe it was
that there were two different functions/operations but we encoded them
as one node where the printing just depended on the type... It's in
the WASM fuzzer, which I haven't had as much part in so I don't
remember the details), but where some types at print time were still
not constrained to a single printable type. We had to choose one to
print, and then make that choice consistent. But I think at that
point it wasn't actually necessary to store the choice in the node,
because printing is the last thing that's ever done to the AST and
there just wasn't another opportunity to make another choice. But if
you do need to make a choice about types that won't be consistently
computed in the future by the normal type checking apparatus, then you
need to store the type choice and unify with it. I think it shouldn't
be necessary. None of my fuzzers store type fields in anything but
name binding nodes.

>>The key to using Definitions in a statement language is basically a
>> >`Block`-like statement that can have a series of statements that
>> >include definitions needs to have a list of definitions and a list of
>> >statements separately. This is a bit of a limitation, since many
>> >languages can interleave definitions and statements, and Xsmith
>> >basically can't. At any rate, the `Block` node from canned-components
>> >does this.
>>
>Ah, I see. Perhaps one can hack by adding another subtype of Block node
>that, when rendering, will splice its body out. That would give an
>appearance of interleaving statements and declarations, though I think
>there could be a problem with variable scope -- if there's a variable
>shadowing in a block, splicing the body out could create a redefinition,
>which might be invalid in some languages.

Yes, exactly. That said, in practice xsmith just chooses a fresh name
for every variable, so unless/until we change that to actually produce
duplicate variable names, it will never be an issue.

>>I hope you find Xsmith useful! What are you intending to fuzz? I'm
>> >also happy to look at any code you have on Github or such for some
>> >debugging help.
>>
>We are fuzzing Dafny <https://github.com/dafny-lang/dafny> compilers.

Cool! Let me know if you put your fuzzer on github or something.