[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [xsmith-dev] XSmith questions



On Wed, Jun 23, 2021 at 01:36:08PM -0700, Sorawee Porncharoenwase wrote:
Thanks for your help!

Every node in the AST has type information.

Sorry, I should have framed the question better.

Every node has type information, but IIUC it is annotated externally,
right? What I’m curious about is why Definition requires an explicit type
field in the AST node? Can’t the type information be stored externally as
an attribute? And my question was that if there’s a reason this can’t be
done, are there any other similar situations too? (That is, it will require
an explicit type field in the AST node).

TL;DR: Storing types on other nodes is generally not necessary.  It's
done for definitions because definitions/references is where code
relationships go from being a tree to being a more complicated graph,
while other nodes don't have that problem.

One reason we store a type in a definition node is because most
definitions are made by lifting, and thus have a specific type needed
before creating the right-hand-side of the definition.  If we didn't
store the type in that case, we would need to search for variable
use-sites when computing a type later.  Also, for variables that can
be mutated (or that hold containers), it can be important to have the
precise type for creating assignment nodes.  Eg. in the face of
subtyping, assignment needs to have an invariant type relationship.
And it's just easier to go about that stuff if the exact type is
always explicitly stored rather than needing to compute it.  If we
didn't store the type, it could remain only partially constrained for
a long time, and potentially be used for mutation in multiple places,
and basically it would just make the type analysis a lot more
complicated.  Another reason is for the RACR cache.  Computing a
definition's type would require looking at its references, which would
make the dependency graph on the type attribute a lot more connected
and tangled, meaning the caches would be flushed a lot more frequently
and require more recomputation.

Generally, when you are making your own node types, you need to store
type information only when you make a type decision that's not fully
determined by the node's parent and child types.  Eg. there may be
cases where legally you can have either of type A or B, but you need
or want to make a decision that requires some data beyond what the
children have or before generating children.  I think this came up in
a fuzzer that had type annotations in the printer (or maybe it was
that there were two different functions/operations but we encoded them
as one node where the printing just depended on the type...  It's in
the WASM fuzzer, which I haven't had as much part in so I don't
remember the details), but where some types at print time were still
not constrained to a single printable type.  We had to choose one to
print, and then make that choice consistent.  But I think at that
point it wasn't actually necessary to store the choice in the node,
because printing is the last thing that's ever done to the AST and
there just wasn't another opportunity to make another choice.  But if
you do need to make a choice about types that won't be consistently
computed in the future by the normal type checking apparatus, then you
need to store the type choice and unify with it.  I think it shouldn't
be necessary.  None of my fuzzers store type fields in anything but
name binding nodes.

The key to using Definitions in a statement language is basically a
>`Block`-like statement that can have a series of statements that
>include definitions needs to have a list of definitions and a list of
>statements separately.  This is a bit of a limitation, since many
>languages can interleave definitions and statements, and Xsmith
>basically can't.  At any rate, the `Block` node from canned-components
>does this.

Ah, I see. Perhaps one can hack by adding another subtype of Block node
that, when rendering, will splice its body out. That would give an
appearance of interleaving statements and declarations, though I think
there could be a problem with variable scope -- if there's a variable
shadowing in a block, splicing the body out could create a redefinition,
which might be invalid in some languages.

Yes, exactly.  That said, in practice xsmith just chooses a fresh name
for every variable, so unless/until we change that to actually produce
duplicate variable names, it will never be an issue.

I hope you find Xsmith useful!  What are you intending to fuzz?  I'm
>also happy to look at any code you have on Github or such for some
>debugging help.

We are fuzzing Dafny <https://github.com/dafny-lang/dafny> compilers.

Cool!  Let me know if you put your fuzzer on github or something.