"Could not SNAPSHOT the trace agents!"Where did you see this error?He sees this as a result of calling loghole:
Well, what does this say: ops> tevc -w -t 30 -e $pid/$eid now __all_tracemon snapshot but I think the problem is that one of the nodes is not reporting back that it has completed the snapshot (within the 30 second timeout). Are all of the nodes up and responsive? The snapshot should go really fast, since it is just a rename/reopen operation on each node. You might be able to figure out which node does not report back by looking at the event log in the experiment log directory in /proj/$pid/exp/$eid. Lbs