From sorawee.pwase at gmail.com  Tue Jun 22 13:43:50 2021
From: sorawee.pwase at gmail.com (Sorawee Porncharoenwase)
Date: Tue, 22 Jun 2021 12:43:50 -0700
Subject: [xsmith-dev] XSmith questions
Message-ID: <CADcuegsu8obiKFrqXWSR3rzkJem4cFsM=p8QPJ7SxTNdoYu30w@mail.gmail.com>

Hi everyone,

I just started using XSmith. I don?t think I grok the framework yet, so I
would really appreciate it if anyone could help me!

1) While I understand that dynamically typed language should specify
type-info to generate programs that don?t cause runtime type mismatch, I
find it weird that types are very integral to the framework. For example,
binder-info seems to presuppose that declaration AST nodes will have type
information in it.

On one hand, that?s strictly speaking not the case for many languages.

On the other hand, I can understand that the specified grammar here doesn?t
need to correspond to the actual language grammar. So let?s just add the
type field into the AST node. But this raises another question: how do I
know which other AST nodes will require additional type information as well?

2) As far as I can see, all declaration nodes have types to be that of the
expression it contains. Again, I find this weird since declaration in most
languages is a statement, not an expression. I tried fixing this by
modifying the ?Another Small Example with Variables? example to:

(add-property
 arith
 type-info
 [Definition [no-return-type (? (n t) (hash 'Expression
(fresh-type-variable)))]]
 [LetStar [(fresh-type-variable)
           (? (n t) (hash 'definitions (? (cn) no-return-type)
                          'sideEs (? (cn) (fresh-type-variable))
                          'Expression t))]]
 ...)

where no-return-type is from xsmith/canned-components. The generation
errors with with:

Exception:
unify!: subtype-unify!: can't unify these types: #<type-variable
(#<range:#<no-return-type>-#<no-return-type>>)> and #<int>

In most examples that I saw, declarations are hoisted at the very top of
function / block / program. I think the no-return-type is particularly
going to be a problem when I want to allow declarations and statements to
interleave, since statements will be assigned a unique type (like
no-return-type), while declarations can?t do the same.

3) How do I debug when XSmith simply hangs and doesn?t output anything? It
looks like the problems I encountered are with types, since if I relax the
constraints, it can generate programs without any problem (though the
generated programs are incorrect, as expected).

4) In xsmith/canned-components, there are several predefined types, but the
module only exports constructors, not accessors. What would be the best way
to extract information from it? unify!?

Thanks!
Sorawee
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </listarchives/xsmith-dev/attachments/20210622/f3f0c9b8/attachment.html>

From william at hatch.uno  Tue Jun 22 15:08:50 2021
From: william at hatch.uno (William G Hatch)
Date: Tue, 22 Jun 2021 15:08:50 -0600
Subject: [xsmith-dev] XSmith questions
In-Reply-To: <YNJQu4xFA7hMugIw@conspirator.d.p.hatch.uno>
References: <CADcuegsu8obiKFrqXWSR3rzkJem4cFsM=p8QPJ7SxTNdoYu30w@mail.gmail.com>
	<YNJQu4xFA7hMugIw@conspirator.d.p.hatch.uno>
Message-ID: <YNJRYgvCldxdkPxR@conspirator.d.p.hatch.uno>

Oops, I hit `reply` instead of `reply-all`.  I'm adding the mailing
list back on now.

On Tue, Jun 22, 2021 at 03:06:05PM -0600, William G Hatch wrote:
>On Tue, Jun 22, 2021 at 12:43:50PM -0700, Sorawee Porncharoenwase wrote:
>>Hi everyone,
>>
>>I just started using XSmith. I don?t think I grok the framework yet, so I
>>would really appreciate it if anyone could help me!
>
>I'll do my best.  (I'm the main author of Xsmith, so you can blame
>most of your issues on me.)  Xsmith is research software with lots of
>sharp edges and insufficient documentation about those edges, and has
>been successfully used mostly by its authors so far.  (Though at least
>one other person has found some bugs using the fuzzers we've made.)
>
>[This email is a bit rambly and repetetive, sorry.  I would edit it
>better but I need to do something else now.  If this doesn't answer
>your questions, please ask some more follow-up questions and I'll try
>to write more cohesively.]
>
>>1) While I understand that dynamically typed language should specify
>>type-info to generate programs that don?t cause runtime type mismatch, I
>>find it weird that types are very integral to the framework. For example,
>>binder-info seems to presuppose that declaration AST nodes will have type
>>information in it.
>>
>>On one hand, that?s strictly speaking not the case for many languages.
>>
>>On the other hand, I can understand that the specified grammar here doesn?t
>>need to correspond to the actual language grammar. So let?s just add the
>>type field into the AST node. But this raises another question: how do I
>>know which other AST nodes will require additional type information as well?
>
>If you really want to have a language where anything goes, you can
>just create a `dyn` base-type and use it everywhere.  But, as you
>mention, fuzzing will likely be ineffective since things will mostly
>just crash with runtime type errors.  If you want to encode the fact
>that various types can be implicitly coerced, you can add explicit
>conversion AST nodes for Xsmith that go away when you pretty-print the
>program.  Eg. you can define an IntToStringCoersion expression that is
>rendered by just rendering the integer inside without adding anything
>else to it.
>
>Every node in the AST has type information.
>
>Generally, an Xsmith language fuzzer will have 2-3 main AST node
>types: Expression, Statement, and Definition.  Definitions are
>separate to simplify Xsmith itself to do something consistent --
>definitions can be expressions, statements, or something separate in
>various languages, but always having Definition nodes in Xsmith makes
>generic name analysis easier.
>
>Expressions have the types you expect -- string, int, list-of-string,
>etc.
>
>Statement types are a bit of a hack.  Statements have two types:
>no-return-statement and return-statement, which is a wrapper type that
>includes an expression type inside that corresponds to the return type
>of the function.
>
>Definition nodes should have expression types, IE the type of the
>right-hand-side of the definition.
>
>In practice, you probably want to use canned-components to get
>Definition node definitions and probably Statement node definitions as
>well.  If you do that you don't need to provide type information for
>them.  The main place you need to worry about types is with
>expressions.
>
>As an aside, Function definitions are often a special case in
>programming languages, but we basically encode them as a definition
>with a lambda on the right hand side that we maybe print differently.
>(If you want to have both definitions with a lambda on the RHS and
>function definitions with your language's shorthand syntax, you can
>define a new Definition node subtype that prints differently.)
>
>
>>2) As far as I can see, all declaration nodes have types to be that of the
>>expression it contains. Again, I find this weird since declaration in most
>>languages is a statement, not an expression. I tried fixing this by
>>modifying the ?Another Small Example with Variables? example to:
>>
>>(add-property
>>arith
>>type-info
>>[Definition [no-return-type (? (n t) (hash 'Expression
>>(fresh-type-variable)))]]
>>[LetStar [(fresh-type-variable)
>>          (? (n t) (hash 'definitions (? (cn) no-return-type)
>>                         'sideEs (? (cn) (fresh-type-variable))
>>                         'Expression t))]]
>>...)
>>
>>where no-return-type is from xsmith/canned-components. The generation
>>errors with with:
>>
>>Exception:
>>unify!: subtype-unify!: can't unify these types: #<type-variable
>>(#<range:#<no-return-type>-#<no-return-type>>)> and #<int>
>
>You need the type of the definition to be the type of the expression
>of the RHS of the definition (modulo subtyping).  Despite the fact
>that when printing you may make the definitions look like statements,
>Xsmith just requires Definition nodes to have the same type as
>variable references to that definition, and keeps Definition and
>Statement nodes separate.
>
>The return-type/no-return-type is sort of a hack to use the type
>system to keep track of statements as well as expressions.  In eg. a
>function, you need to track the return type throughout the function to
>be sure that the function does return (potentially in multiple
>branches of a conditional), and that all return statements return the
>right type.  So by annotating which nodes do or don't return
>something, Xsmith can be sure to generate well-formed functions.
>
>The key to using Definitions in a statement language is basically a
>`Block`-like statement that can have a series of statements that
>include definitions needs to have a list of definitions and a list of
>statements separately.  This is a bit of a limitation, since many
>languages can interleave definitions and statements, and Xsmith
>basically can't.  At any rate, the `Block` node from canned-components
>does this.
>
>Probably the best way to see how to do it in a full-sized (but still
>relatively simple) example is to look at the simple/javascript.rkt
>file in the xsmith-examples directory.  (We haven't actually used that
>one yet, so I'm not 100% certain it generates actually correct
>javascript, but it should be close if it's wrong.  At any rate, it's
>probably the simplest full-size example to see the correlation between
>the definition of the fuzzer and its output.)
>
>The `Block` statement in canned-components, and that the javascript
>example uses, is the main place to put definitions in a
>statement-based language.  In the javascript example we have a series
>of top-level definitions by using `ProgramWithBlock`, then basically
>unwrapping the ?block? and printing the definitions in the top level.
>It also uses `LambdaWithBlock` for function definitions.  So if you
>look at how it's used it should clarify how to have definitions in a
>statement language.
>
>>In most examples that I saw, declarations are hoisted at the very top of
>>function / block / program. I think the no-return-type is particularly
>>going to be a problem when I want to allow declarations and statements to
>>interleave, since statements will be assigned a unique type (like
>>no-return-type), while declarations can?t do the same.
>>
>>3) How do I debug when XSmith simply hangs and doesn?t output anything? It
>>looks like the problems I encountered are with types, since if I relax the
>>constraints, it can generate programs without any problem (though the
>>generated programs are incorrect, as expected).
>
>Xsmith is probably hanging because it is in a lift loop.  IE it has
>created a Definition node, and it needs to generate a right-hand-side
>for the definition.  When it tries to generate the RHS expression, the
>only legal expression it can find is VariableReference.  So it creates
>a VariableReference, and it can't find a (non-circular) variable of
>that type to reference, so it lifts one.  Then it needs to make an RHS
>for that definition as well, and ...
>
>We really ought to catch this situation somehow, crash, and report an
>error.  But we don't at the moment.  Sorry.  If you hit control-C and
>get the debug output, you can see the type of the lift cycle (by just
>seeing what is being lifted over and over), and then try to figure out
>why nothing else is legal.  You may have forgotten a literal node, or
>maybe the literal node has holes (eg. a lambda is a literal but not
>atomic, it needs statements/expressions inside), meaning that you need
>to add an annotation of `#:prop wont-over-deepen #t` so it will
>generate it even if it's at the max AST depth.
>
>>4) In xsmith/canned-components, there are several predefined types, but the
>>module only exports constructors, not accessors. What would be the best way
>>to extract information from it? unify!?
>
>Yes, use `unify!`. Eg.
>
>```
>(define my-return-type (fresh-type-variable))
>(define f (function-type (fresh-type-variable) my-return-type))
>(unify! f some-function-type)
>```
>
>This is something I should also improve but haven't yet.  The basic
>accessors can fail when logically they should succeed because they may
>be passed a type variable instead of the struct for the type you think
>it is.  So if you try to use eg. `(function-type t)` it might fail.  I
>should provide a `function-type!` function that does the obvious
>struct creation and unification, but I haven't yet.
>
>I may add that over the coming weeks.  I've been busy with some other
>things, but am shifting my gears back to Xsmith development, and hope
>to finish a few final features (eg. work on stuff to do
>feedback-directed fuzzing) and do some serious fuzzing with it.
>(We've run fuzz campaigns and found some bugs in the past, but mostly
>switched back to add-more-features mode to try to make it more
>effective, and haven't quite gotten back to actually trying to find
>bugs.)
>
>>Thanks!
>>Sorawee
>
>I hope you find Xsmith useful!  What are you intending to fuzz?  I'm
>also happy to look at any code you have on Github or such for some
>debugging help.


From sorawee.pwase at gmail.com  Wed Jun 23 14:36:08 2021
From: sorawee.pwase at gmail.com (Sorawee Porncharoenwase)
Date: Wed, 23 Jun 2021 13:36:08 -0700
Subject: [xsmith-dev] XSmith questions
In-Reply-To: <YNJRYgvCldxdkPxR@conspirator.d.p.hatch.uno>
References: <CADcuegsu8obiKFrqXWSR3rzkJem4cFsM=p8QPJ7SxTNdoYu30w@mail.gmail.com>
	<YNJQu4xFA7hMugIw@conspirator.d.p.hatch.uno>
	<YNJRYgvCldxdkPxR@conspirator.d.p.hatch.uno>
Message-ID: <CADcuegsyjqiu6XejSyaM0EGVrRimQwQ_VHR88VpMYW2Ye-ek_w@mail.gmail.com>

Thanks for your help!

>Every node in the AST has type information.
>
Sorry, I should have framed the question better.

Every node has type information, but IIUC it is annotated externally,
right? What I?m curious about is why Definition requires an explicit type
field in the AST node? Can?t the type information be stored externally as
an attribute? And my question was that if there?s a reason this can?t be
done, are there any other similar situations too? (That is, it will require
an explicit type field in the AST node).

>The key to using Definitions in a statement language is basically a
> >`Block`-like statement that can have a series of statements that
> >include definitions needs to have a list of definitions and a list of
> >statements separately.  This is a bit of a limitation, since many
> >languages can interleave definitions and statements, and Xsmith
> >basically can't.  At any rate, the `Block` node from canned-components
> >does this.
>
Ah, I see. Perhaps one can hack by adding another subtype of Block node
that, when rendering, will splice its body out. That would give an
appearance of interleaving statements and declarations, though I think
there could be a problem with variable scope -- if there's a variable
shadowing in a block, splicing the body out could create a redefinition,
which might be invalid in some languages.

>I hope you find Xsmith useful!  What are you intending to fuzz?  I'm
> >also happy to look at any code you have on Github or such for some
> >debugging help.
>
We are fuzzing Dafny <https://github.com/dafny-lang/dafny> compilers.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: </listarchives/xsmith-dev/attachments/20210623/24aab0d5/attachment.html>

From william at hatch.uno  Wed Jun 23 16:27:09 2021
From: william at hatch.uno (William G Hatch)
Date: Wed, 23 Jun 2021 16:27:09 -0600
Subject: [xsmith-dev] XSmith questions
In-Reply-To: <CADcuegsyjqiu6XejSyaM0EGVrRimQwQ_VHR88VpMYW2Ye-ek_w@mail.gmail.com>
References: <CADcuegsu8obiKFrqXWSR3rzkJem4cFsM=p8QPJ7SxTNdoYu30w@mail.gmail.com>
	<YNJQu4xFA7hMugIw@conspirator.d.p.hatch.uno>
	<YNJRYgvCldxdkPxR@conspirator.d.p.hatch.uno>
	<CADcuegsyjqiu6XejSyaM0EGVrRimQwQ_VHR88VpMYW2Ye-ek_w@mail.gmail.com>
Message-ID: <YNO1PbE33bhI96f+@conspirator.d.p.hatch.uno>

On Wed, Jun 23, 2021 at 01:36:08PM -0700, Sorawee Porncharoenwase wrote:
>Thanks for your help!
>
>>Every node in the AST has type information.
>>
>Sorry, I should have framed the question better.
>
>Every node has type information, but IIUC it is annotated externally,
>right? What I?m curious about is why Definition requires an explicit type
>field in the AST node? Can?t the type information be stored externally as
>an attribute? And my question was that if there?s a reason this can?t be
>done, are there any other similar situations too? (That is, it will require
>an explicit type field in the AST node).

TL;DR: Storing types on other nodes is generally not necessary.  It's
done for definitions because definitions/references is where code
relationships go from being a tree to being a more complicated graph,
while other nodes don't have that problem.

One reason we store a type in a definition node is because most
definitions are made by lifting, and thus have a specific type needed
before creating the right-hand-side of the definition.  If we didn't
store the type in that case, we would need to search for variable
use-sites when computing a type later.  Also, for variables that can
be mutated (or that hold containers), it can be important to have the
precise type for creating assignment nodes.  Eg. in the face of
subtyping, assignment needs to have an invariant type relationship.
And it's just easier to go about that stuff if the exact type is
always explicitly stored rather than needing to compute it.  If we
didn't store the type, it could remain only partially constrained for
a long time, and potentially be used for mutation in multiple places,
and basically it would just make the type analysis a lot more
complicated.  Another reason is for the RACR cache.  Computing a
definition's type would require looking at its references, which would
make the dependency graph on the type attribute a lot more connected
and tangled, meaning the caches would be flushed a lot more frequently
and require more recomputation.

Generally, when you are making your own node types, you need to store
type information only when you make a type decision that's not fully
determined by the node's parent and child types.  Eg. there may be
cases where legally you can have either of type A or B, but you need
or want to make a decision that requires some data beyond what the
children have or before generating children.  I think this came up in
a fuzzer that had type annotations in the printer (or maybe it was
that there were two different functions/operations but we encoded them
as one node where the printing just depended on the type...  It's in
the WASM fuzzer, which I haven't had as much part in so I don't
remember the details), but where some types at print time were still
not constrained to a single printable type.  We had to choose one to
print, and then make that choice consistent.  But I think at that
point it wasn't actually necessary to store the choice in the node,
because printing is the last thing that's ever done to the AST and
there just wasn't another opportunity to make another choice.  But if
you do need to make a choice about types that won't be consistently
computed in the future by the normal type checking apparatus, then you
need to store the type choice and unify with it.  I think it shouldn't
be necessary.  None of my fuzzers store type fields in anything but
name binding nodes.

>>The key to using Definitions in a statement language is basically a
>> >`Block`-like statement that can have a series of statements that
>> >include definitions needs to have a list of definitions and a list of
>> >statements separately.  This is a bit of a limitation, since many
>> >languages can interleave definitions and statements, and Xsmith
>> >basically can't.  At any rate, the `Block` node from canned-components
>> >does this.
>>
>Ah, I see. Perhaps one can hack by adding another subtype of Block node
>that, when rendering, will splice its body out. That would give an
>appearance of interleaving statements and declarations, though I think
>there could be a problem with variable scope -- if there's a variable
>shadowing in a block, splicing the body out could create a redefinition,
>which might be invalid in some languages.

Yes, exactly.  That said, in practice xsmith just chooses a fresh name
for every variable, so unless/until we change that to actually produce
duplicate variable names, it will never be an issue.

>>I hope you find Xsmith useful!  What are you intending to fuzz?  I'm
>> >also happy to look at any code you have on Github or such for some
>> >debugging help.
>>
>We are fuzzing Dafny <https://github.com/dafny-lang/dafny> compilers.

Cool!  Let me know if you put your fuzzer on github or something.