[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Testbed-admins] Emulab Rebuild Problems



I think I am over this hump -- thanks.

Your sql queries all checked out as expected (interface 0 was the 'control_network' for the node type we created and the MAC address was as expected). The newnode MFS correctly determined that 'bge0' was the control net interface.

But poking around in the node_type_attributes table led me to the 'control_interface' attribute. I had it set to 'bge0' for our new node type, but a careful re-read of the install instructions related to defining a new node type led me to try 'eth0' instead. Voila! All better. Well, almost all better except for the steam coming out of my ears... (I presume this non-intuitive setting is related to interface "normalization" between BSD and Linux?)

Thanks for all your help on this one!

PS - Now it appears I've got to tweak the freebsd MFS to boot successfully on these nodes. You may be hearing from me :)

On 6/1/2010 11:56 PM, Mike Hibler wrote:
Get into "mysql tbdb" on boss and do:

  select i.new_node_id,n.type,i.card,i.mac from new_interfaces as i, \
   new_nodes as n where i.new_node_id=n.new_node_id;

For pc20, there should be one row for each interface.  Look at the "card"
column for the row which corresponds to the control interface (i.e., the
one that has the MAC you expect).  That value should be between 0 and 5
if you have 6 IFs per machine, and probably '0' if you are using the first
builtin interface as the control net.

Anyway, that value should match what you get for:

  select * from node_type_attributes where \
   type='XXX' and attrkey='control_network';

where 'XXX' is whatever the "type" column showed in the first query.
If "attrvalue" is not the same, then you need to fix it in
node_type_attributes, unless...

...the "type" value shown in the first query is not what you were expecting
in which case the newnode script running on the node may have mis-guessed
the node type.  That script uses CPU speed, RAM and disk size measured on
the machine and compared against values from the node_type_attributes table
to make a guess at what the node type is.  Look at the console of the node
and see what values it measured and make sure they agree with the appropriate
fields in the DB.

On Tue, Jun 01, 2010 at 08:23:08PM -0500, Barry Trent wrote:
Yes -- that is exactly the test I've been running -- switchmac from a
boss shell (from the boss console, in fact, su'd to user 'elabman')
after the newnode MFS has booted and is sitting at a login prompt.

The MAC address does show up in the switchmac output, is on the correct
switch and port, and shows the "ctrl" tag at the end. This is the only
instance of that MAC address in the switchmac output (unlike before,
when "leakage" was causing 2 occurrences of that MAC). And yet the
control port address isn't identified via the web page at all. Does
switchmac have any debug/diagnostic options? I was planning on diving
into the switchmac script tomorrow (Wednesday)...

Leigh Stoller wrote:
on the correct ports of the experimental switch, but the control
interface for the new node comes up blank (that is, the mac address is
found but no corresponding switch/port is set). I still don't understand
why switchmac can't identify this ethernet interface as the control
interface and assign the correct port on my control switch. I presume I
could do this manually, but I wonder what the problem is.

I feel like I'm getting really close, but I'm not there yet!

Hmm, when you have a node sitting in the newnode MFS, and the web
interface is reporting the experimental interfaces okay, you should be
able to run switchmac directly from a shell on boss.

boss>  wap /usr/testbed/libexec/switchmac

Does this output list the interface with a 'ctrl' tag on the end?
Or does the interface say 'expt' (use the MAC address).

Lbs