[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Testbed-admins] Emulab Rebuild Problems



I think this helps -- a little. Background - We've currently got 2 separate switches, one for control (2950) and one for experimental (2948). We may expand in the future, but we're keeping it simple for now. I've got a dozen 1U servers with 6 ethernets each that I'm trying to set up as experimental nodes, but I'm just working with one for now.

I've now got a setup much as you described. On the experimental switch I have 2 ports in a vlan named control-hardware (vlan 2). I also have the addessable interface for the experimental switch in this vlan. I've got another small (4 ports) vlan 2 (also named control-hardware) on the control switch, and a cable connects these two vlans together (not set as a trunk). Boss has an interface into this vlan on the control switch, and can ping the experimental switch's management address. switchmac's output seems sane in terms of the ports/addresses it detects.

Note that all other ports on the switches are just in the default vlan 1 configuration.

The other interace of boss and one interface from each experimental node hooks into the control switch. My experimental nodes can DHCP and boot up the newnode image and register to be added to the emulab. When I scan for ports from the web interface, all 5 experimental interface are found on the correct ports of the experimental switch, but the control interface for the new node comes up blank (that is, the mac address is found but no corresponding switch/port is set). I still don't understand why switchmac can't identify this ethernet interface as the control interface and assign the correct port on my control switch. I presume I could do this manually, but I wonder what the problem is.

I feel like I'm getting really close, but I'm not there yet!

On 5/29/2010 1:04 PM, Mike Hibler wrote:
You will have to refresh me on your HW configuration.  You do have separate
control and experimental switches, correct?  Just one of each?

Since you aren't sub-segmenting your control network, the config should be
that boss has two interfaces connected to your control net switch, one in
the "node control network" VLAN (VLAN 3 for us) and one in the "hardware
control network" VLAN (VLAN 10 for us).  There should be a wire between
the control and experimental switches, but it doesn't need to be a trunk
link, both ports just need to be in VLAN 10.

This should prevent the switchmac process from seeing the control interfaces
via the experiment net.  But even when it is a trunk, switchmac should ignore
it, if the ends are marked as a trunk in the DB "wires" table, ala:

mysql>  select * from wires where type='Trunk' and (node_id1='cisco2' or node_id2='cisco2');
+-------+-----+-------+----------+-------+-------+----------+-------+-------+
| cable | len | type  | node_id1 | card1 | port1 | node_id2 | card2 | port2 |
+-------+-----+-------+----------+-------+-------+----------+-------+-------+
| 2646  | 90  | Trunk | cisco14  | 1     | 1     | cisco2   | 5     | 12    |
+-------+-----+-------+----------+-------+-------+----------+-------+-------+
1 row in set (0.00 sec)

But you probably don't want that wire to be a trunk, unless there is more
going on in your config than I remember.

On Sat, May 29, 2010 at 11:11:11AM -0500, Barry Trent wrote:
The control interface for pc20 wound up in the interfaces table as an
"expt" interface. I can see from the archive that this problem has come
up before. It appears that I haven't managed to isolate the control
switch and the experimental switch adequately?

I played around with various vlan and trunkport settings on the
switches, but it seems that I either block ALL traffic from boss to the
experimental switch or the switchmac script reports that it finds the
control interface of pc20 on the trunk port (port 48) of the
experimental switch. (That is, all the configurations in which I can
ping from boss to the experimental switch seem to wind up confusing
switchmac.)

Note that we have not divided up our control switch into vlans as
discussed in the network design portion of the install instructions. The
instructions make this sound like an option -- perhaps it's not? Or have
I just not hit on the right vlan and trunkport settings on the
experimental switch yet?

As for power controllers -- we aren't using any at present.

On 5/28/2010 6:17 PM, Mike Hibler wrote:
Did the control net information for that machine wind up in your DB?
You can do "mysql tbdb" and then:
     select * from interfaces where node_id='pc20';
and see if there is an entry for the control network.

What do you have for power controllers?

On Fri, May 28, 2010 at 05:50:56PM -0500, Barry Trent wrote:
I'm working on re-building our small emulab with new hardware here at
Architecture Technology in Minnesota and I'm having some trouble.

The first big problem I think I have solved: We are using Cisco 2948
switches for the experimental network. I figured the type field in the
node_types and nodes tables should be 'cisco2948'. Wrong. The 2948 is
actually in the 4000 class of Catalyst switches! The difference appears
to be that it uses "community string indexing" for some of its MIBs.
(Described here:
ftp://ftp-sj.cisco.com/pub/mibs/supportlists/wsc4000/wsc4000-communityIndexing.html).


So -- note for posterity: For the Cisco 2948 switch, set its type value
to 'cisco4000' in the type fields of the node_types and nodes tables.

Now the new problem I'm up against:

We PXE boot our testbed machines and they load the freebsd.newnode and
appear in the "New Testbed Nodes" page of the web interface. We "Search
switch ports for selected nodes". Our 5 experimental interfaces are
properly discovered but the control interface isn't. I figure this
shouldn't be a big problem(?) -- we enter this manually.

When we try to actually "Create" the node the operation appears to
succeed:

-----
/usr/testbed/www
pc20 succesfully added!
Re-generating dhcpd.conf
Restarting dhcpd: /usr/local/bin/sudo -S /usr/local/etc/rc.d/2.dhcpd.sh
stop
Restarting dhcpd: /usr/local/bin/sudo -S /usr/local/etc/rc.d/2.dhcpd.sh
start
   dhcpd wrapperSetting up nameserver
Running exports_setup
Rebooting nodes...
Rebooting 192.168.10.109


Finished - when you are satisifed that the nodes are working
correctly, use nfree on boss to free them from the emulab-ops/hwdown
experiment.
-----

BUT, the node never reboots. It just sits there at the login prompt of
the freebsd.newnode boot image. It never "reports in" to the hwdown
experiment (although we can see that there is now a machine in the
hwdown experiment, the idle time just keeps going up). No entry for the
machine actually gets placed into the re-generated dhcpd.conf file
either -- I presume that one should. If we reboot manually we go right
back to the newnode image.

An ideas/suggestions on how to troubleshoot this problem?
_______________________________________________
Testbed-admins mailing list
Testbed-admins@flux.utah.edu
http://www.flux.utah.edu/mailman/listinfo/testbed-admins