[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xsmith-dev] XSmith questions



Hello again.

1) Is there a support for multiple bindings, just like define-values in
Scheme?

No, but you can maybe simulate this to some degree.  In
canned-components there is a DefinitionNoRhs node that I made for
adding definition nodes for things like loops and list comprehensions.
The idea is that you can use it in a node that introduces a binding
where the RHS is some implicit operation on some other piece of data
the node has (eg. a list expression).  You could make a subclass of
Block that has a field for multi-value-RHSs, insert DefinitionNoRhs
nodes in the definitions field, and keep track of the relationship
somehow.  It might be a bit of work, but I think it should be doable.
Ideally this is something that could be added to canned-components,
but I don't see myself writing it at the moment.

2) I’m adding first-order functions whose call is limited in some context.
The special call is approximated by how MyDefinition only allows
ProcedureApplication as the RHS in the following code

Hmm, it seems like first-order functions would be encoded as a
function application node that only allows a reference as its RHS
rather than allowing arbitrary expressions.  But that said, I'm not
sure I fully understand what you're trying to do.


I previously tried lumping Definition and MyDefinition
together in MyProgram and got an error _xsmith_resolve-reference: Unbound
reference: lift_2 — not sure what it means).

That's probably a bug.  “Lifting” is what I've called the situation
where a reference is generated that causes a new definition to be
added such that the variable is in scope at the use site.  IE the
binding is lifted from the use site to some outer node that contains
definitions.  `lift_X` is the name Xsmith gives to definitions that
are created automatically in this way (which is ultimately most
definitions).  The exception is caused because somehow the lift ended
up in a place where it wasn't actually in scope.  That shouldn't
happen.  Anyway, if you point me to a version of a fuzzer that does
this, I'll try to debug it.


This fails in two ways:

  - Sometimes I get an error: RACR exception: "AG evaluator exception;"
  "Cannot access" 'xsmith_no-holes-in-subtree? "attribute - the given node is
  a bud.". Indeed, when I use --s-exp-on-error, I can see that it’s a bud
  node. But this is weird because if I specify Expression, a
  ProcedureApplication can be generated just fine.

Probably something is going wrong in the `fresh` property.  Have you
changed that?

  - Sometimes I get an error: xsmith: internal error -- no destinations
  for lift from: XsmithAstHoleExpression,.

I should change that error message.  Anyway, it seems to think there
is no outer node that has a `Definition *` child that can be the
target for lifting a definition.  Generally this happens when you
forget to have an outer Program node or something that can hold
definitions.

Could you point me to a complete example that I can run that has these
problems?  I want to be able to run the code to see the errors and
debug them, but the provided snippet isn't quite enough.

Thanks,
William Hatch

On Sun, Jul 11, 2021 at 02:42:49PM -0700, Sorawee Porncharoenwase wrote:
Hi everyone!

I have a couple more questions:

1) Is there a support for multiple bindings, just like define-values in
Scheme?
2) I’m adding first-order functions whose call is limited in some context.
The special call is approximated by how MyDefinition only allows
ProcedureApplication as the RHS in the following code, which is added to
the somelisp example:

(add-to-grammar
somelisp
[MyProgram #f
           ([decls : Definition *]
            [body : SubProgram])
           #:prop type-info
           [(fresh-type-variable)
            (λ (n t)
              (hash 'decls (λ (c) (fresh-type-variable))
                    'body (fresh-type-variable)))]]
[SubProgram #f
            ([decls : MyDefinition *]
             [body : Expression])
            #:prop type-info
            [(fresh-type-variable)
             (λ (n t)
               (hash 'decls (λ (c) (fresh-type-variable))
                     'body (fresh-type-variable)))]]
[MyDefinition #f ([type]
                  [name = (fresh-var-name "b_")]
                  [e : ProcedureApplication])
              #:prop binder-info ()
              #:prop type-info [(fresh-type-variable) (λ (n t) (hash 'e t))]])

(add-property
somelisp
lift-type->ast-binder-type
[#f (λ (type) (if (zero? (random 2)) 'MyDefinition 'Definition))])

with MyProgram as the top-level node.

(I structured the grammar like that so that the RHS of MyDefinition can be
lifted to Definition. I previously tried lumping Definition and MyDefinition
together in MyProgram and got an error _xsmith_resolve-reference: Unbound
reference: lift_2 — not sure what it means).

This fails in two ways:

  - Sometimes I get an error: RACR exception: "AG evaluator exception;"
  "Cannot access" 'xsmith_no-holes-in-subtree? "attribute - the given node is
  a bud.". Indeed, when I use --s-exp-on-error, I can see that it’s a bud
  node. But this is weird because if I specify Expression, a
  ProcedureApplication can be generated just fine.
  - Sometimes I get an error: xsmith: internal error -- no destinations
  for lift from: XsmithAstHoleExpression,.

In the last email, I asked if lift-type->ast-binder-type could be used
nondeterministically in this manner, and my experience with it so far seems
to indicate no. Perhaps a better example to illustrate this point is to
first adjust MyDefinition to allow an arbitrary Expression as the RHS.
Deterministically returning either MyDefinition or Definition will
successfully generate the program, but that’s not what I want — I want both
to be generated. However, randomizing will result in the above error. Does
that mean lift-type->ast-binder-type is not the right tool for me (though I
am required to specify it, as there are multiple binder-info…)?

Thanks,
Sorawee

On Wed, Jun 23, 2021 at 3:27 PM William G Hatch <william@hatch.uno> wrote:

On Wed, Jun 23, 2021 at 01:36:08PM -0700, Sorawee Porncharoenwase wrote:
>Thanks for your help!
>
>>Every node in the AST has type information.
>>
>Sorry, I should have framed the question better.
>
>Every node has type information, but IIUC it is annotated externally,
>right? What I’m curious about is why Definition requires an explicit type
>field in the AST node? Can’t the type information be stored externally as
>an attribute? And my question was that if there’s a reason this can’t be
>done, are there any other similar situations too? (That is, it will
require
>an explicit type field in the AST node).

TL;DR: Storing types on other nodes is generally not necessary.  It's
done for definitions because definitions/references is where code
relationships go from being a tree to being a more complicated graph,
while other nodes don't have that problem.

One reason we store a type in a definition node is because most
definitions are made by lifting, and thus have a specific type needed
before creating the right-hand-side of the definition.  If we didn't
store the type in that case, we would need to search for variable
use-sites when computing a type later.  Also, for variables that can
be mutated (or that hold containers), it can be important to have the
precise type for creating assignment nodes.  Eg. in the face of
subtyping, assignment needs to have an invariant type relationship.
And it's just easier to go about that stuff if the exact type is
always explicitly stored rather than needing to compute it.  If we
didn't store the type, it could remain only partially constrained for
a long time, and potentially be used for mutation in multiple places,
and basically it would just make the type analysis a lot more
complicated.  Another reason is for the RACR cache.  Computing a
definition's type would require looking at its references, which would
make the dependency graph on the type attribute a lot more connected
and tangled, meaning the caches would be flushed a lot more frequently
and require more recomputation.

Generally, when you are making your own node types, you need to store
type information only when you make a type decision that's not fully
determined by the node's parent and child types.  Eg. there may be
cases where legally you can have either of type A or B, but you need
or want to make a decision that requires some data beyond what the
children have or before generating children.  I think this came up in
a fuzzer that had type annotations in the printer (or maybe it was
that there were two different functions/operations but we encoded them
as one node where the printing just depended on the type...  It's in
the WASM fuzzer, which I haven't had as much part in so I don't
remember the details), but where some types at print time were still
not constrained to a single printable type.  We had to choose one to
print, and then make that choice consistent.  But I think at that
point it wasn't actually necessary to store the choice in the node,
because printing is the last thing that's ever done to the AST and
there just wasn't another opportunity to make another choice.  But if
you do need to make a choice about types that won't be consistently
computed in the future by the normal type checking apparatus, then you
need to store the type choice and unify with it.  I think it shouldn't
be necessary.  None of my fuzzers store type fields in anything but
name binding nodes.

>>The key to using Definitions in a statement language is basically a
>> >`Block`-like statement that can have a series of statements that
>> >include definitions needs to have a list of definitions and a list of
>> >statements separately.  This is a bit of a limitation, since many
>> >languages can interleave definitions and statements, and Xsmith
>> >basically can't.  At any rate, the `Block` node from canned-components
>> >does this.
>>
>Ah, I see. Perhaps one can hack by adding another subtype of Block node
>that, when rendering, will splice its body out. That would give an
>appearance of interleaving statements and declarations, though I think
>there could be a problem with variable scope -- if there's a variable
>shadowing in a block, splicing the body out could create a redefinition,
>which might be invalid in some languages.

Yes, exactly.  That said, in practice xsmith just chooses a fresh name
for every variable, so unless/until we change that to actually produce
duplicate variable names, it will never be an issue.

>>I hope you find Xsmith useful!  What are you intending to fuzz?  I'm
>> >also happy to look at any code you have on Github or such for some
>> >debugging help.
>>
>We are fuzzing Dafny <https://github.com/dafny-lang/dafny> compilers.

Cool!  Let me know if you put your fuzzer on github or something.