This sounds like a very good idea, but I'm confused about the statement about using ops. Also, I would include a checksum for OS images, so that the client can tell if the miniboss somehow has a stale image, and fall back to the real one.Oops, I meant boss (which, at least in our testbed, serves the images it sees on ops/fs via NFS and hands them out via frisbee/http). Although.. it might be worth considering, for widearea support, running a ftpd or small httpd on fs/ops to hand out these images - it seems like an odd choice to make boss be involved in handing out those huge files if boss has to access those same images via NFS behind the scenes anyhow (although I guess the shared images are actually on boss).I agree that it might be a good idea to have ops hand out non-shared images, but you have to control access somehow. We would not want to hand out images unless you had a "credential" for that image.
So perhaps our code path could look like this: 1) Client boots into maintenance image, things happen, it is ready to be imaged 2) Client asks boss what to do 3) a) If there's a miniboss for the client, boss hands an URL to that. b) If the client is otherwise a widearea node, boss provides an URL telling it to fetch the image from ops/fs via FTP/HTTPS c) If the client is local, boss tells the client to run frisbee via a swam managed by ops/fs 4) Client starts the request 5(a/b) ops/fs gets the request, and passes credentials to boss to ask if it can proceed. Boss presumably says yes 6) Client gets image from ops/fs/frisbee swarm I'm not sure how best to handle that permissions check in a FTP server. I'm imagining FTP would be very good if we could do it because it's quite bandwidth-efficient (HTTP to ops could be a first-level fallback) if we could get over that barrier. Doing this kind of thing via HTTP is very easy because we could pass the data with a CGI. Maybe we'd want to also move the standard images from boss to ops/fs to handle this in a unified way (and take some more load off of boss). As an added bonus, I think you're right about the checksum idea (although an alternative would be to have the miniboss know about versioning of each image it has, something already pretty useful for nodes that should not reimage very often like homenet nodes). -- Pat Gunn Principal Research Programmer/Analyst, CMCL School of Computer Science, Carnegie Mellon University