From sorawee.pwase at gmail.com Tue Jun 22 13:43:50 2021 From: sorawee.pwase at gmail.com (Sorawee Porncharoenwase) Date: Tue, 22 Jun 2021 12:43:50 -0700 Subject: [xsmith-dev] XSmith questions Message-ID: Hi everyone, I just started using XSmith. I don?t think I grok the framework yet, so I would really appreciate it if anyone could help me! 1) While I understand that dynamically typed language should specify type-info to generate programs that don?t cause runtime type mismatch, I find it weird that types are very integral to the framework. For example, binder-info seems to presuppose that declaration AST nodes will have type information in it. On one hand, that?s strictly speaking not the case for many languages. On the other hand, I can understand that the specified grammar here doesn?t need to correspond to the actual language grammar. So let?s just add the type field into the AST node. But this raises another question: how do I know which other AST nodes will require additional type information as well? 2) As far as I can see, all declaration nodes have types to be that of the expression it contains. Again, I find this weird since declaration in most languages is a statement, not an expression. I tried fixing this by modifying the ?Another Small Example with Variables? example to: (add-property arith type-info [Definition [no-return-type (? (n t) (hash 'Expression (fresh-type-variable)))]] [LetStar [(fresh-type-variable) (? (n t) (hash 'definitions (? (cn) no-return-type) 'sideEs (? (cn) (fresh-type-variable)) 'Expression t))]] ...) where no-return-type is from xsmith/canned-components. The generation errors with with: Exception: unify!: subtype-unify!: can't unify these types: #-#>)> and # In most examples that I saw, declarations are hoisted at the very top of function / block / program. I think the no-return-type is particularly going to be a problem when I want to allow declarations and statements to interleave, since statements will be assigned a unique type (like no-return-type), while declarations can?t do the same. 3) How do I debug when XSmith simply hangs and doesn?t output anything? It looks like the problems I encountered are with types, since if I relax the constraints, it can generate programs without any problem (though the generated programs are incorrect, as expected). 4) In xsmith/canned-components, there are several predefined types, but the module only exports constructors, not accessors. What would be the best way to extract information from it? unify!? Thanks! Sorawee -------------- next part -------------- An HTML attachment was scrubbed... URL: From william at hatch.uno Tue Jun 22 15:08:50 2021 From: william at hatch.uno (William G Hatch) Date: Tue, 22 Jun 2021 15:08:50 -0600 Subject: [xsmith-dev] XSmith questions In-Reply-To: References: Message-ID: Oops, I hit `reply` instead of `reply-all`. I'm adding the mailing list back on now. On Tue, Jun 22, 2021 at 03:06:05PM -0600, William G Hatch wrote: >On Tue, Jun 22, 2021 at 12:43:50PM -0700, Sorawee Porncharoenwase wrote: >>Hi everyone, >> >>I just started using XSmith. I don?t think I grok the framework yet, so I >>would really appreciate it if anyone could help me! > >I'll do my best. (I'm the main author of Xsmith, so you can blame >most of your issues on me.) Xsmith is research software with lots of >sharp edges and insufficient documentation about those edges, and has >been successfully used mostly by its authors so far. (Though at least >one other person has found some bugs using the fuzzers we've made.) > >[This email is a bit rambly and repetetive, sorry. I would edit it >better but I need to do something else now. If this doesn't answer >your questions, please ask some more follow-up questions and I'll try >to write more cohesively.] > >>1) While I understand that dynamically typed language should specify >>type-info to generate programs that don?t cause runtime type mismatch, I >>find it weird that types are very integral to the framework. For example, >>binder-info seems to presuppose that declaration AST nodes will have type >>information in it. >> >>On one hand, that?s strictly speaking not the case for many languages. >> >>On the other hand, I can understand that the specified grammar here doesn?t >>need to correspond to the actual language grammar. So let?s just add the >>type field into the AST node. But this raises another question: how do I >>know which other AST nodes will require additional type information as well? > >If you really want to have a language where anything goes, you can >just create a `dyn` base-type and use it everywhere. But, as you >mention, fuzzing will likely be ineffective since things will mostly >just crash with runtime type errors. If you want to encode the fact >that various types can be implicitly coerced, you can add explicit >conversion AST nodes for Xsmith that go away when you pretty-print the >program. Eg. you can define an IntToStringCoersion expression that is >rendered by just rendering the integer inside without adding anything >else to it. > >Every node in the AST has type information. > >Generally, an Xsmith language fuzzer will have 2-3 main AST node >types: Expression, Statement, and Definition. Definitions are >separate to simplify Xsmith itself to do something consistent -- >definitions can be expressions, statements, or something separate in >various languages, but always having Definition nodes in Xsmith makes >generic name analysis easier. > >Expressions have the types you expect -- string, int, list-of-string, >etc. > >Statement types are a bit of a hack. Statements have two types: >no-return-statement and return-statement, which is a wrapper type that >includes an expression type inside that corresponds to the return type >of the function. > >Definition nodes should have expression types, IE the type of the >right-hand-side of the definition. > >In practice, you probably want to use canned-components to get >Definition node definitions and probably Statement node definitions as >well. If you do that you don't need to provide type information for >them. The main place you need to worry about types is with >expressions. > >As an aside, Function definitions are often a special case in >programming languages, but we basically encode them as a definition >with a lambda on the right hand side that we maybe print differently. >(If you want to have both definitions with a lambda on the RHS and >function definitions with your language's shorthand syntax, you can >define a new Definition node subtype that prints differently.) > > >>2) As far as I can see, all declaration nodes have types to be that of the >>expression it contains. Again, I find this weird since declaration in most >>languages is a statement, not an expression. I tried fixing this by >>modifying the ?Another Small Example with Variables? example to: >> >>(add-property >>arith >>type-info >>[Definition [no-return-type (? (n t) (hash 'Expression >>(fresh-type-variable)))]] >>[LetStar [(fresh-type-variable) >> (? (n t) (hash 'definitions (? (cn) no-return-type) >> 'sideEs (? (cn) (fresh-type-variable)) >> 'Expression t))]] >>...) >> >>where no-return-type is from xsmith/canned-components. The generation >>errors with with: >> >>Exception: >>unify!: subtype-unify!: can't unify these types: #>(#-#>)> and # > >You need the type of the definition to be the type of the expression >of the RHS of the definition (modulo subtyping). Despite the fact >that when printing you may make the definitions look like statements, >Xsmith just requires Definition nodes to have the same type as >variable references to that definition, and keeps Definition and >Statement nodes separate. > >The return-type/no-return-type is sort of a hack to use the type >system to keep track of statements as well as expressions. In eg. a >function, you need to track the return type throughout the function to >be sure that the function does return (potentially in multiple >branches of a conditional), and that all return statements return the >right type. So by annotating which nodes do or don't return >something, Xsmith can be sure to generate well-formed functions. > >The key to using Definitions in a statement language is basically a >`Block`-like statement that can have a series of statements that >include definitions needs to have a list of definitions and a list of >statements separately. This is a bit of a limitation, since many >languages can interleave definitions and statements, and Xsmith >basically can't. At any rate, the `Block` node from canned-components >does this. > >Probably the best way to see how to do it in a full-sized (but still >relatively simple) example is to look at the simple/javascript.rkt >file in the xsmith-examples directory. (We haven't actually used that >one yet, so I'm not 100% certain it generates actually correct >javascript, but it should be close if it's wrong. At any rate, it's >probably the simplest full-size example to see the correlation between >the definition of the fuzzer and its output.) > >The `Block` statement in canned-components, and that the javascript >example uses, is the main place to put definitions in a >statement-based language. In the javascript example we have a series >of top-level definitions by using `ProgramWithBlock`, then basically >unwrapping the ?block? and printing the definitions in the top level. >It also uses `LambdaWithBlock` for function definitions. So if you >look at how it's used it should clarify how to have definitions in a >statement language. > >>In most examples that I saw, declarations are hoisted at the very top of >>function / block / program. I think the no-return-type is particularly >>going to be a problem when I want to allow declarations and statements to >>interleave, since statements will be assigned a unique type (like >>no-return-type), while declarations can?t do the same. >> >>3) How do I debug when XSmith simply hangs and doesn?t output anything? It >>looks like the problems I encountered are with types, since if I relax the >>constraints, it can generate programs without any problem (though the >>generated programs are incorrect, as expected). > >Xsmith is probably hanging because it is in a lift loop. IE it has >created a Definition node, and it needs to generate a right-hand-side >for the definition. When it tries to generate the RHS expression, the >only legal expression it can find is VariableReference. So it creates >a VariableReference, and it can't find a (non-circular) variable of >that type to reference, so it lifts one. Then it needs to make an RHS >for that definition as well, and ... > >We really ought to catch this situation somehow, crash, and report an >error. But we don't at the moment. Sorry. If you hit control-C and >get the debug output, you can see the type of the lift cycle (by just >seeing what is being lifted over and over), and then try to figure out >why nothing else is legal. You may have forgotten a literal node, or >maybe the literal node has holes (eg. a lambda is a literal but not >atomic, it needs statements/expressions inside), meaning that you need >to add an annotation of `#:prop wont-over-deepen #t` so it will >generate it even if it's at the max AST depth. > >>4) In xsmith/canned-components, there are several predefined types, but the >>module only exports constructors, not accessors. What would be the best way >>to extract information from it? unify!? > >Yes, use `unify!`. Eg. > >``` >(define my-return-type (fresh-type-variable)) >(define f (function-type (fresh-type-variable) my-return-type)) >(unify! f some-function-type) >``` > >This is something I should also improve but haven't yet. The basic >accessors can fail when logically they should succeed because they may >be passed a type variable instead of the struct for the type you think >it is. So if you try to use eg. `(function-type t)` it might fail. I >should provide a `function-type!` function that does the obvious >struct creation and unification, but I haven't yet. > >I may add that over the coming weeks. I've been busy with some other >things, but am shifting my gears back to Xsmith development, and hope >to finish a few final features (eg. work on stuff to do >feedback-directed fuzzing) and do some serious fuzzing with it. >(We've run fuzz campaigns and found some bugs in the past, but mostly >switched back to add-more-features mode to try to make it more >effective, and haven't quite gotten back to actually trying to find >bugs.) > >>Thanks! >>Sorawee > >I hope you find Xsmith useful! What are you intending to fuzz? I'm >also happy to look at any code you have on Github or such for some >debugging help. From sorawee.pwase at gmail.com Wed Jun 23 14:36:08 2021 From: sorawee.pwase at gmail.com (Sorawee Porncharoenwase) Date: Wed, 23 Jun 2021 13:36:08 -0700 Subject: [xsmith-dev] XSmith questions In-Reply-To: References: Message-ID: Thanks for your help! >Every node in the AST has type information. > Sorry, I should have framed the question better. Every node has type information, but IIUC it is annotated externally, right? What I?m curious about is why Definition requires an explicit type field in the AST node? Can?t the type information be stored externally as an attribute? And my question was that if there?s a reason this can?t be done, are there any other similar situations too? (That is, it will require an explicit type field in the AST node). >The key to using Definitions in a statement language is basically a > >`Block`-like statement that can have a series of statements that > >include definitions needs to have a list of definitions and a list of > >statements separately. This is a bit of a limitation, since many > >languages can interleave definitions and statements, and Xsmith > >basically can't. At any rate, the `Block` node from canned-components > >does this. > Ah, I see. Perhaps one can hack by adding another subtype of Block node that, when rendering, will splice its body out. That would give an appearance of interleaving statements and declarations, though I think there could be a problem with variable scope -- if there's a variable shadowing in a block, splicing the body out could create a redefinition, which might be invalid in some languages. >I hope you find Xsmith useful! What are you intending to fuzz? I'm > >also happy to look at any code you have on Github or such for some > >debugging help. > We are fuzzing Dafny compilers. -------------- next part -------------- An HTML attachment was scrubbed... URL: From william at hatch.uno Wed Jun 23 16:27:09 2021 From: william at hatch.uno (William G Hatch) Date: Wed, 23 Jun 2021 16:27:09 -0600 Subject: [xsmith-dev] XSmith questions In-Reply-To: References: Message-ID: On Wed, Jun 23, 2021 at 01:36:08PM -0700, Sorawee Porncharoenwase wrote: >Thanks for your help! > >>Every node in the AST has type information. >> >Sorry, I should have framed the question better. > >Every node has type information, but IIUC it is annotated externally, >right? What I?m curious about is why Definition requires an explicit type >field in the AST node? Can?t the type information be stored externally as >an attribute? And my question was that if there?s a reason this can?t be >done, are there any other similar situations too? (That is, it will require >an explicit type field in the AST node). TL;DR: Storing types on other nodes is generally not necessary. It's done for definitions because definitions/references is where code relationships go from being a tree to being a more complicated graph, while other nodes don't have that problem. One reason we store a type in a definition node is because most definitions are made by lifting, and thus have a specific type needed before creating the right-hand-side of the definition. If we didn't store the type in that case, we would need to search for variable use-sites when computing a type later. Also, for variables that can be mutated (or that hold containers), it can be important to have the precise type for creating assignment nodes. Eg. in the face of subtyping, assignment needs to have an invariant type relationship. And it's just easier to go about that stuff if the exact type is always explicitly stored rather than needing to compute it. If we didn't store the type, it could remain only partially constrained for a long time, and potentially be used for mutation in multiple places, and basically it would just make the type analysis a lot more complicated. Another reason is for the RACR cache. Computing a definition's type would require looking at its references, which would make the dependency graph on the type attribute a lot more connected and tangled, meaning the caches would be flushed a lot more frequently and require more recomputation. Generally, when you are making your own node types, you need to store type information only when you make a type decision that's not fully determined by the node's parent and child types. Eg. there may be cases where legally you can have either of type A or B, but you need or want to make a decision that requires some data beyond what the children have or before generating children. I think this came up in a fuzzer that had type annotations in the printer (or maybe it was that there were two different functions/operations but we encoded them as one node where the printing just depended on the type... It's in the WASM fuzzer, which I haven't had as much part in so I don't remember the details), but where some types at print time were still not constrained to a single printable type. We had to choose one to print, and then make that choice consistent. But I think at that point it wasn't actually necessary to store the choice in the node, because printing is the last thing that's ever done to the AST and there just wasn't another opportunity to make another choice. But if you do need to make a choice about types that won't be consistently computed in the future by the normal type checking apparatus, then you need to store the type choice and unify with it. I think it shouldn't be necessary. None of my fuzzers store type fields in anything but name binding nodes. >>The key to using Definitions in a statement language is basically a >> >`Block`-like statement that can have a series of statements that >> >include definitions needs to have a list of definitions and a list of >> >statements separately. This is a bit of a limitation, since many >> >languages can interleave definitions and statements, and Xsmith >> >basically can't. At any rate, the `Block` node from canned-components >> >does this. >> >Ah, I see. Perhaps one can hack by adding another subtype of Block node >that, when rendering, will splice its body out. That would give an >appearance of interleaving statements and declarations, though I think >there could be a problem with variable scope -- if there's a variable >shadowing in a block, splicing the body out could create a redefinition, >which might be invalid in some languages. Yes, exactly. That said, in practice xsmith just chooses a fresh name for every variable, so unless/until we change that to actually produce duplicate variable names, it will never be an issue. >>I hope you find Xsmith useful! What are you intending to fuzz? I'm >> >also happy to look at any code you have on Github or such for some >> >debugging help. >> >We are fuzzing Dafny compilers. Cool! Let me know if you put your fuzzer on github or something.