1. How do I investigate these programs for UB, and separate the good ones?
Use tools for finding undefined behaviors. If none of them finds any undefined behavior, then it is at least conceivable that the program doesn't execute any undefined behavior.
2. Does the huge number of wrong programs generated indicate that there is some UB in the code of extended_csmith or is this an error related to the changes with the script?
This is a question that you should determine the answer to. John