[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Testbed-admins] Nodes Stuck in reloading
Mike,
Same results after I restarted mrouted on BOSS.
-----Original Message-----
From: Mike Hibler [mailto:mike@flux.utah.edu]
Sent: Thursday, January 28, 2010 3:31 PM
To: Korrie, Donna M CTR USAF AFMC AFRL/RYRD
Cc: Mike Hibler; testbed-admins@flux.utah.edu; Leigh Stoller
Subject: Re: [Testbed-admins] Nodes Stuck in reloading
Ah, maybe your mrouted died. Run "/etc/rc.d/mrouted restart" and see if
things get better.
On Thu, Jan 28, 2010 at 03:06:32PM -0500, Korrie, Donna M CTR USAF AFMC
AFRL/RYRD wrote:
> I can get into mysql
> I have a rmouted.conf in /etc
> #
> # Taken from Utah:
> #
> # this is the "other" interface
> # Do everything we can to stop traffic on it
> # We cannot just disable it or mrouted won't run
> # (since there would only be a single active interface)
> #
> phyint 10.10.200.14 force_leaf passive deny 0/0
> ~
> ~
>
>
> ________________________________
>
> From: Mike Hibler [mailto:mike@flux.utah.edu]
> Sent: Thu 1/28/2010 3:02 PM
> To: Korrie, Donna M CTR USAF AFMC AFRL/RYRD
> Cc: Mike Hibler; testbed-admins@flux.utah.edu; Leigh Stoller
> Subject: Re: [Testbed-admins] Nodes Stuck in reloading
>
>
>
> The frisbeelauncher messages may be a false alarm (i.e., they may be
old),
> it looks like everything is running okay. Try connecting to mysql
> interactively (on boss):
>
> mysql tbdb
>
> and see if you get a mysql> command prompt. You'll see in the ps
listing
> below that there are actually frisbeed's running (really just one with
> multiple threads). Compare the command line -m and -p info with what
the
> client thinks.
>
> Maybe you did need to run "mrouted". Do you have a mrouted.conf in
either
> /etc or /usr/local/etc on boss.
>
> On Thu, Jan 28, 2010 at 02:47:07PM -0500, Korrie, Donna M CTR USAF
AFMC AFRL/RYRD wrote:
> > I rebooted boss yesterday and ops today...that did not seem to help.
> > Should I reboot boss again?
> >
> > [root@boss:/usr/testbed/log](2:43pm)#ps axww
> > PID TT STAT TIME COMMAND
> > 0 ?? WLs 0:00.00 [swapper]
> > 1 ?? ILs 0:00.01 /sbin/init --
> > 2 ?? DL 0:03.16 [g_event]
> > 3 ?? DL 0:04.92 [g_up]
> > 4 ?? DL 0:05.53 [g_down]
> > 5 ?? DL 0:00.00 [thread taskq]
> > 6 ?? DL 0:00.00 [kqueue taskq]
> > 7 ?? DL 0:00.00 [acpi_task_0]
> > 8 ?? DL 0:00.00 [acpi_task_1]
> > 9 ?? DL 0:00.00 [acpi_task_2]
> > 10 ?? RL 2838:03.75 [idle]
> > 11 ?? WL 2:06.87 [swi4: clock sio]
> > 12 ?? WL 0:00.00 [swi3: vm]
> > 13 ?? WL 0:04.19 [swi1: net]
> > 14 ?? DL 0:03.60 [yarrow]
> > 15 ?? WL 0:00.00 [swi6: Giant taskq]
> > 16 ?? WL 0:00.00 [swi5: +]
> > 17 ?? DL 0:00.00 [xpt_thrd]
> > 18 ?? WL 0:00.00 [swi2: cambio]
> > 19 ?? WL 0:00.02 [swi6: task queue]
> > 20 ?? WL 0:00.00 [irq9: acpi0]
> > 21 ?? WL 0:13.63 [irq16: bce0 em1++]
> > 22 ?? WL 0:00.00 [irq19: em0]
> > 23 ?? WL 0:00.00 [irq17: em2]
> > 24 ?? WL 0:00.00 [irq18: em3]
> > 25 ?? WL 0:00.00 [irq21: uhci0 uhci+]
> > 26 ?? DL 0:00.01 [usb0]
> > 27 ?? DL 0:00.00 [usbtask]
> > 28 ?? WL 0:03.44 [irq20: uhci1]
> > 29 ?? DL 0:00.01 [usb1]
> > 30 ?? DL 0:00.01 [usb2]
> > 31 ?? DL 0:00.01 [usb3]
> > 32 ?? WL 0:00.00 [irq23: atapci0]
> > 33 ?? WL 0:00.01 [swi0: sio]
> > 34 ?? WL 0:00.00 [irq14: ata0]
> > 35 ?? WL 0:00.00 [irq15: ata1]
> > 36 ?? WL 0:00.00 [irq1: atkbd0]
> > 37 ?? DL 0:00.14 [pagedaemon]
> > 38 ?? DL 0:00.00 [vmdaemon]
> > 39 ?? DL 0:11.51 [pagezero]
> > 40 ?? DL 0:00.37 [bufdaemon]
> > 41 ?? DL 0:00.41 [vnlru]
> > 42 ?? DL 1:39.88 [syncer]
> > 43 ?? DL 0:00.78 [softdepflush]
> > 44 ?? DL 0:04.03 [schedcpu]
> > 135 ?? Is 0:00.00 adjkerntz -i
> > 764 ?? Is 0:00.00 /usr/sbin/moused -p /dev/ums0 -t auto -I
> > /var/run/moused.ums0.pid
> > 821 ?? Is 0:00.00 /sbin/devd
> > 918 ?? Ss 0:02.06 /usr/sbin/syslogd
> > 929 ?? Ss 0:00.83 /usr/sbin/named -u root
> > 1080 ?? Ss 0:00.08 /usr/sbin/rpcbind
> > 1208 ?? Is 0:00.01 nfsd: master (nfsd)
> > 1210 ?? I 0:00.00 nfsd: server (nfsd)
> > 1211 ?? I 0:00.00 nfsd: server (nfsd)
> > 1212 ?? I 0:00.00 nfsd: server (nfsd)
> > 1213 ?? I 0:00.00 nfsd: server (nfsd)
> > 1214 ?? I 0:00.00 nfsd: server (nfsd)
> > 1215 ?? I 0:00.00 nfsd: server (nfsd)
> > 1216 ?? I 0:00.00 nfsd: server (nfsd)
> > 1217 ?? I 0:00.00 nfsd: server (nfsd)
> > 1218 ?? I 0:00.00 nfsd: server (nfsd)
> > 1219 ?? I 0:00.00 nfsd: server (nfsd)
> > 1220 ?? I 0:00.00 nfsd: server (nfsd)
> > 1221 ?? I 0:00.00 nfsd: server (nfsd)
> > 1222 ?? I 0:00.00 nfsd: server (nfsd)
> > 1223 ?? I 0:00.00 nfsd: server (nfsd)
> > 1224 ?? I 0:00.00 nfsd: server (nfsd)
> > 1225 ?? I 0:00.00 nfsd: server (nfsd)
> > 1242 ?? Is 0:00.00 [sh]
> > 1313 ?? S 3:11.29 [mysqld]
> > 1344 ?? Ss 0:01.83 /usr/sbin/ntpd -c /etc/ntp.conf -p
> > /var/run/ntpd.pid -f /var/db/ntpd.drift
> > 1364 ?? Ss 0:00.11 /usr/sbin/usbd
> > 1371 ?? Ss 0:02.01 /usr/local/sbin/httpd -DSSL
> > 1379 ?? Ss 0:00.80 /usr/local/libexec/pubsubd
> > 1392 ?? Is 0:00.00 /usr/sbin/sshd
> > 1398 ?? Ss 0:01.53 sendmail: accepting connections (sendmail)
> > 1402 ?? Is 0:00.03 sendmail: Queue runner@00:30:00 for
> > /var/spool/clientmqueue (sendmail)
> > 1408 ?? Is 0:00.32 /usr/sbin/cron -s
> > 1423 ?? Is 0:00.00 /usr/bin/perl -w
> > /usr/testbed/sbin/daemon_wrapper /usr/local/sbin/dhcpd -f bce0
> > (perl5.8.8)
> > 1425 ?? S 0:01.16 /usr/local/sbin/dhcpd -f bce0
> > 1426 ?? I 0:01.84 /usr/local/sbin/httpd -DSSL
> > 1427 ?? I 0:01.61 /usr/local/sbin/httpd -DSSL
> > 1428 ?? I 0:01.52 /usr/local/sbin/httpd -DSSL
> > 1429 ?? I 0:01.79 /usr/local/sbin/httpd -DSSL
> > 1430 ?? I 0:01.57 /usr/local/sbin/httpd -DSSL
> > 1431 ?? I 0:01.94 /usr/local/sbin/httpd -DSSL
> > 1432 ?? I 0:01.68 /usr/local/sbin/httpd -DSSL
> > 1433 ?? I 0:01.60 /usr/local/sbin/httpd -DSSL
> > 1434 ?? I 0:02.22 /usr/local/sbin/httpd -DSSL
> > 1435 ?? I 0:01.52 /usr/local/sbin/httpd -DSSL
> > 1436 ?? I 0:01.52 /usr/local/sbin/httpd -DSSL
> > 1437 ?? I 0:01.90 /usr/local/sbin/httpd -DSSL
> > 1438 ?? I 0:01.38 /usr/local/sbin/httpd -DSSL
> > 1439 ?? I 0:02.04 /usr/local/sbin/httpd -DSSL
> > 1440 ?? I 0:02.17 /usr/local/sbin/httpd -DSSL
> > 1441 ?? I 0:01.44 /usr/local/sbin/httpd -DSSL
> > 1442 ?? I 0:02.39 /usr/local/sbin/httpd -DSSL
> > 1443 ?? I 0:01.59 /usr/local/sbin/httpd -DSSL
> > 1444 ?? I 0:02.05 /usr/local/sbin/httpd -DSSL
> > 1445 ?? S 0:01.68 /usr/local/sbin/httpd -DSSL
> > 1446 ?? I 0:01.46 /usr/local/sbin/httpd -DSSL
> > 1447 ?? I 0:02.48 /usr/local/sbin/httpd -DSSL
> > 1448 ?? I 0:01.64 /usr/local/sbin/httpd -DSSL
> > 1449 ?? I 0:01.48 /usr/local/sbin/httpd -DSSL
> > 1450 ?? I 0:01.74 /usr/local/sbin/httpd -DSSL
> > 1451 ?? I 0:01.66 /usr/local/sbin/httpd -DSSL
> > 1452 ?? I 0:01.49 /usr/local/sbin/httpd -DSSL
> > 1453 ?? I 0:01.26 /usr/local/sbin/httpd -DSSL
> > 1454 ?? I 0:01.72 /usr/local/sbin/httpd -DSSL
> > 1455 ?? I 0:01.79 /usr/local/sbin/httpd -DSSL
> > 1463 ?? Is 0:00.01 /usr/testbed/sbin/bootinfo
> > 1467 ?? Is 0:00.00 /usr/testbed/sbin/tmcd -i 192.168.0.14
> > 1469 ?? I 0:00.01 tmcd: UDP 7777: 23 done (tmcd)
> > 1470 ?? I 0:00.01 tmcd: UDP 7777: 23 done (tmcd)
> > 1471 ?? I 0:00.01 tmcd: UDP 7777: 23 done (tmcd)
> > 1472 ?? I 0:00.00 tmcd: UDP 14447: 0 done (tmcd)
> > 1473 ?? I 0:00.00 tmcd: TCP 14447: 0 done (tmcd)
> > 1474 ?? I 0:00.01 tmcd: TCP 7777: 18 done (tmcd)
> > 1475 ?? I 0:00.01 tmcd: TCP 7777: 18 done (tmcd)
> > 1476 ?? I 0:00.01 tmcd: TCP 7777: 18 done (tmcd)
> > 1477 ?? I 0:00.01 tmcd: TCP 7777: 18 done (tmcd)
> > 1478 ?? I 0:00.01 tmcd: TCP 7777: 17 done (tmcd)
> > 1479 ?? I 0:00.01 tmcd: TCP 7777: 17 done (tmcd)
> > 1480 ?? I 0:00.01 tmcd: TCP 7777: 17 done (tmcd)
> > 1481 ?? I 0:00.01 tmcd: TCP 7777: 17 done (tmcd)
> > 1482 ?? Ss 0:01.12 /usr/testbed/sbin/capserver
> > 1484 ?? Is 0:00.62 /usr/bin/perl -wT
> > /usr/testbed/sbin/lastlog_daemon (perl5.8.8)
> > 1490 ?? Is 0:00.00 /usr/testbed/sbin/sdcollectd
> > 1492 ?? Is 0:00.65 /usr/testbed/sbin/stated (perl5.8.8)
> > 1499 ?? Is 0:00.04 /usr/local/bin/python
> > /usr/testbed/sbin/sslxmlrpc_server.py
> > 1515 ?? Ss 0:10.17 /usr/bin/perl -w
> > /usr/testbed/sbin/mysqld_watchdog (perl5.8.8)
> > 1524 ?? Is 0:00.00 /usr/bin/perl -w
> > /usr/testbed/sbin/daemon_wrapper -i 30 -l /usr/testbed/log/batchlog
> > /usr/testbed/sbin/batch_daemon -d (perl5.8.8)
> > 1531 ?? S 0:07.73 /usr/bin/perl -wT
> > /usr/testbed/sbin/batch_daemon -d (perl5.8.8)
> > 1540 ?? Is 0:00.02 /usr/local/libexec/tftpd -m
> > /usr/local/etc/tftpd.rules -lvvvv -C 40 -s /tftpboot
> > 1560 ?? Is 0:00.00 /usr/sbin/inetd -wW -R 0
> > 1643 ?? I 0:02.66 /usr/local/sbin/httpd -DSSL
> > 1644 ?? I 0:01.66 /usr/local/sbin/httpd -DSSL
> > 1645 ?? I 0:01.60 /usr/local/sbin/httpd -DSSL
> > 1646 ?? I 0:01.32 /usr/local/sbin/httpd -DSSL
> > 1647 ?? I 0:01.70 /usr/local/sbin/httpd -DSSL
> > 1648 ?? I 0:01.44 /usr/local/sbin/httpd -DSSL
> > 1649 ?? I 0:01.61 /usr/local/sbin/httpd -DSSL
> > 14122 ?? Z 0:00.06 <defunct>
> > 27297 ?? Is 0:00.07 sshd: root@ttyp0 (sshd)
> > 27472 ?? Is 0:00.01 /usr/bin/perl -wT
> > /usr/testbed/sbin/frisbeelauncher 10035 (perl5.8.8)
> > 28285 ?? Ss 0:00.11 sshd: root@ttyp1 (sshd)
> > 28433 ?? S 0:05.91 /usr/testbed/sbin/frisbeed -i 192.168.0.14
-W
> > 72000000 -K 15 -m 234.5.15.107 -p 7511
> > /usr/testbed/images/FBSD63+FC8-STD.ndz
> > 28434 ?? S 0:00.01 /usr/testbed/sbin/frisbeed -i 192.168.0.14
-W
> > 72000000 -K 15 -m 234.5.15.107 -p 7511
> > /usr/testbed/images/FBSD63+FC8-STD.ndz
> > 28435 ?? S 0:00.29 /usr/testbed/sbin/frisbeed -i 192.168.0.14
-W
> > 72000000 -K 15 -m 234.5.15.107 -p 7511
> > /usr/testbed/images/FBSD63+FC8-STD.ndz
> > 29124 ?? SL 0:00.00 [nfsiod 0]
> > 1502 d0- S 0:54.98 /usr/bin/perl -wT
> > /usr/testbed/sbin/reload_daemon (perl5.8.8)
> > 1509 d0- S 0:11.61 /usr/bin/perl -wT
> > /usr/testbed/sbin/checkup_daemon (perl5.8.8)
> > 1578 d0 Is+ 0:00.00 /usr/libexec/getty std.115200 console
> > 1579 v0 Is+ 0:00.00 /usr/libexec/getty Pc ttyv0
> > 1580 v1 Is+ 0:00.00 /usr/libexec/getty Pc ttyv1
> > 1581 v2 Is+ 0:00.00 /usr/libexec/getty Pc ttyv2
> > 1582 v3 Is+ 0:00.00 /usr/libexec/getty Pc ttyv3
> > 1583 v4 Is+ 0:00.00 /usr/libexec/getty Pc ttyv4
> > 1584 v5 Is+ 0:00.00 /usr/libexec/getty Pc ttyv5
> > 1585 v6 Is+ 0:00.00 /usr/libexec/getty Pc ttyv6
> > 1586 v7 Is+ 0:00.00 /usr/libexec/getty Pc ttyv7
> > 27302 p0 Is 0:00.01 -csh (csh)
> > 28133 p0 I+ 0:00.06 ssh tips
> > 28290 p1 Ss 0:00.03 -csh (csh)
> > 29126 p1 R+ 0:00.00 ps axww
> >
> > -----Original Message-----
> > From: Mike Hibler [mailto:mike@flux.utah.edu]
> > Sent: Thursday, January 28, 2010 2:44 PM
> > To: Korrie, Donna M CTR USAF AFMC AFRL/RYRD
> > Cc: Mike Hibler; testbed-admins@flux.utah.edu; Leigh Stoller
> > Subject: Re: [Testbed-admins] Nodes Stuck in reloading
> >
> > On Thu, Jan 28, 2010 at 02:22:22PM -0500, Korrie, Donna M CTR USAF
AFMC
> > AFRL/RYRD wrote:
> > > ...
> > > Do I need to restart anything?
> > >
> > >
> >
> > What does "ps axww" show? Just "ps" won't show all the processes.
> > At the very least it seems like mysqld isn't running. There is
supposed
> > to
> > be a watchdog running to make sure that mysqld is running and
> > responding,
> > but maybe it isn't running either.
> >
> > You may be best off just rebooting your boss, but let me see the ps
info
> > first.
>
>