[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Testbed-admins] Nodes Stuck in reloading
The frisbeelauncher messages may be a false alarm (i.e., they may be old),
it looks like everything is running okay. Try connecting to mysql
interactively (on boss):
mysql tbdb
and see if you get a mysql> command prompt. You'll see in the ps listing
below that there are actually frisbeed's running (really just one with
multiple threads). Compare the command line -m and -p info with what the
client thinks.
Maybe you did need to run "mrouted". Do you have a mrouted.conf in either
/etc or /usr/local/etc on boss.
On Thu, Jan 28, 2010 at 02:47:07PM -0500, Korrie, Donna M CTR USAF AFMC AFRL/RYRD wrote:
> I rebooted boss yesterday and ops today...that did not seem to help.
> Should I reboot boss again?
>
> [root@boss:/usr/testbed/log](2:43pm)#ps axww
> PID TT STAT TIME COMMAND
> 0 ?? WLs 0:00.00 [swapper]
> 1 ?? ILs 0:00.01 /sbin/init --
> 2 ?? DL 0:03.16 [g_event]
> 3 ?? DL 0:04.92 [g_up]
> 4 ?? DL 0:05.53 [g_down]
> 5 ?? DL 0:00.00 [thread taskq]
> 6 ?? DL 0:00.00 [kqueue taskq]
> 7 ?? DL 0:00.00 [acpi_task_0]
> 8 ?? DL 0:00.00 [acpi_task_1]
> 9 ?? DL 0:00.00 [acpi_task_2]
> 10 ?? RL 2838:03.75 [idle]
> 11 ?? WL 2:06.87 [swi4: clock sio]
> 12 ?? WL 0:00.00 [swi3: vm]
> 13 ?? WL 0:04.19 [swi1: net]
> 14 ?? DL 0:03.60 [yarrow]
> 15 ?? WL 0:00.00 [swi6: Giant taskq]
> 16 ?? WL 0:00.00 [swi5: +]
> 17 ?? DL 0:00.00 [xpt_thrd]
> 18 ?? WL 0:00.00 [swi2: cambio]
> 19 ?? WL 0:00.02 [swi6: task queue]
> 20 ?? WL 0:00.00 [irq9: acpi0]
> 21 ?? WL 0:13.63 [irq16: bce0 em1++]
> 22 ?? WL 0:00.00 [irq19: em0]
> 23 ?? WL 0:00.00 [irq17: em2]
> 24 ?? WL 0:00.00 [irq18: em3]
> 25 ?? WL 0:00.00 [irq21: uhci0 uhci+]
> 26 ?? DL 0:00.01 [usb0]
> 27 ?? DL 0:00.00 [usbtask]
> 28 ?? WL 0:03.44 [irq20: uhci1]
> 29 ?? DL 0:00.01 [usb1]
> 30 ?? DL 0:00.01 [usb2]
> 31 ?? DL 0:00.01 [usb3]
> 32 ?? WL 0:00.00 [irq23: atapci0]
> 33 ?? WL 0:00.01 [swi0: sio]
> 34 ?? WL 0:00.00 [irq14: ata0]
> 35 ?? WL 0:00.00 [irq15: ata1]
> 36 ?? WL 0:00.00 [irq1: atkbd0]
> 37 ?? DL 0:00.14 [pagedaemon]
> 38 ?? DL 0:00.00 [vmdaemon]
> 39 ?? DL 0:11.51 [pagezero]
> 40 ?? DL 0:00.37 [bufdaemon]
> 41 ?? DL 0:00.41 [vnlru]
> 42 ?? DL 1:39.88 [syncer]
> 43 ?? DL 0:00.78 [softdepflush]
> 44 ?? DL 0:04.03 [schedcpu]
> 135 ?? Is 0:00.00 adjkerntz -i
> 764 ?? Is 0:00.00 /usr/sbin/moused -p /dev/ums0 -t auto -I
> /var/run/moused.ums0.pid
> 821 ?? Is 0:00.00 /sbin/devd
> 918 ?? Ss 0:02.06 /usr/sbin/syslogd
> 929 ?? Ss 0:00.83 /usr/sbin/named -u root
> 1080 ?? Ss 0:00.08 /usr/sbin/rpcbind
> 1208 ?? Is 0:00.01 nfsd: master (nfsd)
> 1210 ?? I 0:00.00 nfsd: server (nfsd)
> 1211 ?? I 0:00.00 nfsd: server (nfsd)
> 1212 ?? I 0:00.00 nfsd: server (nfsd)
> 1213 ?? I 0:00.00 nfsd: server (nfsd)
> 1214 ?? I 0:00.00 nfsd: server (nfsd)
> 1215 ?? I 0:00.00 nfsd: server (nfsd)
> 1216 ?? I 0:00.00 nfsd: server (nfsd)
> 1217 ?? I 0:00.00 nfsd: server (nfsd)
> 1218 ?? I 0:00.00 nfsd: server (nfsd)
> 1219 ?? I 0:00.00 nfsd: server (nfsd)
> 1220 ?? I 0:00.00 nfsd: server (nfsd)
> 1221 ?? I 0:00.00 nfsd: server (nfsd)
> 1222 ?? I 0:00.00 nfsd: server (nfsd)
> 1223 ?? I 0:00.00 nfsd: server (nfsd)
> 1224 ?? I 0:00.00 nfsd: server (nfsd)
> 1225 ?? I 0:00.00 nfsd: server (nfsd)
> 1242 ?? Is 0:00.00 [sh]
> 1313 ?? S 3:11.29 [mysqld]
> 1344 ?? Ss 0:01.83 /usr/sbin/ntpd -c /etc/ntp.conf -p
> /var/run/ntpd.pid -f /var/db/ntpd.drift
> 1364 ?? Ss 0:00.11 /usr/sbin/usbd
> 1371 ?? Ss 0:02.01 /usr/local/sbin/httpd -DSSL
> 1379 ?? Ss 0:00.80 /usr/local/libexec/pubsubd
> 1392 ?? Is 0:00.00 /usr/sbin/sshd
> 1398 ?? Ss 0:01.53 sendmail: accepting connections (sendmail)
> 1402 ?? Is 0:00.03 sendmail: Queue runner@00:30:00 for
> /var/spool/clientmqueue (sendmail)
> 1408 ?? Is 0:00.32 /usr/sbin/cron -s
> 1423 ?? Is 0:00.00 /usr/bin/perl -w
> /usr/testbed/sbin/daemon_wrapper /usr/local/sbin/dhcpd -f bce0
> (perl5.8.8)
> 1425 ?? S 0:01.16 /usr/local/sbin/dhcpd -f bce0
> 1426 ?? I 0:01.84 /usr/local/sbin/httpd -DSSL
> 1427 ?? I 0:01.61 /usr/local/sbin/httpd -DSSL
> 1428 ?? I 0:01.52 /usr/local/sbin/httpd -DSSL
> 1429 ?? I 0:01.79 /usr/local/sbin/httpd -DSSL
> 1430 ?? I 0:01.57 /usr/local/sbin/httpd -DSSL
> 1431 ?? I 0:01.94 /usr/local/sbin/httpd -DSSL
> 1432 ?? I 0:01.68 /usr/local/sbin/httpd -DSSL
> 1433 ?? I 0:01.60 /usr/local/sbin/httpd -DSSL
> 1434 ?? I 0:02.22 /usr/local/sbin/httpd -DSSL
> 1435 ?? I 0:01.52 /usr/local/sbin/httpd -DSSL
> 1436 ?? I 0:01.52 /usr/local/sbin/httpd -DSSL
> 1437 ?? I 0:01.90 /usr/local/sbin/httpd -DSSL
> 1438 ?? I 0:01.38 /usr/local/sbin/httpd -DSSL
> 1439 ?? I 0:02.04 /usr/local/sbin/httpd -DSSL
> 1440 ?? I 0:02.17 /usr/local/sbin/httpd -DSSL
> 1441 ?? I 0:01.44 /usr/local/sbin/httpd -DSSL
> 1442 ?? I 0:02.39 /usr/local/sbin/httpd -DSSL
> 1443 ?? I 0:01.59 /usr/local/sbin/httpd -DSSL
> 1444 ?? I 0:02.05 /usr/local/sbin/httpd -DSSL
> 1445 ?? S 0:01.68 /usr/local/sbin/httpd -DSSL
> 1446 ?? I 0:01.46 /usr/local/sbin/httpd -DSSL
> 1447 ?? I 0:02.48 /usr/local/sbin/httpd -DSSL
> 1448 ?? I 0:01.64 /usr/local/sbin/httpd -DSSL
> 1449 ?? I 0:01.48 /usr/local/sbin/httpd -DSSL
> 1450 ?? I 0:01.74 /usr/local/sbin/httpd -DSSL
> 1451 ?? I 0:01.66 /usr/local/sbin/httpd -DSSL
> 1452 ?? I 0:01.49 /usr/local/sbin/httpd -DSSL
> 1453 ?? I 0:01.26 /usr/local/sbin/httpd -DSSL
> 1454 ?? I 0:01.72 /usr/local/sbin/httpd -DSSL
> 1455 ?? I 0:01.79 /usr/local/sbin/httpd -DSSL
> 1463 ?? Is 0:00.01 /usr/testbed/sbin/bootinfo
> 1467 ?? Is 0:00.00 /usr/testbed/sbin/tmcd -i 192.168.0.14
> 1469 ?? I 0:00.01 tmcd: UDP 7777: 23 done (tmcd)
> 1470 ?? I 0:00.01 tmcd: UDP 7777: 23 done (tmcd)
> 1471 ?? I 0:00.01 tmcd: UDP 7777: 23 done (tmcd)
> 1472 ?? I 0:00.00 tmcd: UDP 14447: 0 done (tmcd)
> 1473 ?? I 0:00.00 tmcd: TCP 14447: 0 done (tmcd)
> 1474 ?? I 0:00.01 tmcd: TCP 7777: 18 done (tmcd)
> 1475 ?? I 0:00.01 tmcd: TCP 7777: 18 done (tmcd)
> 1476 ?? I 0:00.01 tmcd: TCP 7777: 18 done (tmcd)
> 1477 ?? I 0:00.01 tmcd: TCP 7777: 18 done (tmcd)
> 1478 ?? I 0:00.01 tmcd: TCP 7777: 17 done (tmcd)
> 1479 ?? I 0:00.01 tmcd: TCP 7777: 17 done (tmcd)
> 1480 ?? I 0:00.01 tmcd: TCP 7777: 17 done (tmcd)
> 1481 ?? I 0:00.01 tmcd: TCP 7777: 17 done (tmcd)
> 1482 ?? Ss 0:01.12 /usr/testbed/sbin/capserver
> 1484 ?? Is 0:00.62 /usr/bin/perl -wT
> /usr/testbed/sbin/lastlog_daemon (perl5.8.8)
> 1490 ?? Is 0:00.00 /usr/testbed/sbin/sdcollectd
> 1492 ?? Is 0:00.65 /usr/testbed/sbin/stated (perl5.8.8)
> 1499 ?? Is 0:00.04 /usr/local/bin/python
> /usr/testbed/sbin/sslxmlrpc_server.py
> 1515 ?? Ss 0:10.17 /usr/bin/perl -w
> /usr/testbed/sbin/mysqld_watchdog (perl5.8.8)
> 1524 ?? Is 0:00.00 /usr/bin/perl -w
> /usr/testbed/sbin/daemon_wrapper -i 30 -l /usr/testbed/log/batchlog
> /usr/testbed/sbin/batch_daemon -d (perl5.8.8)
> 1531 ?? S 0:07.73 /usr/bin/perl -wT
> /usr/testbed/sbin/batch_daemon -d (perl5.8.8)
> 1540 ?? Is 0:00.02 /usr/local/libexec/tftpd -m
> /usr/local/etc/tftpd.rules -lvvvv -C 40 -s /tftpboot
> 1560 ?? Is 0:00.00 /usr/sbin/inetd -wW -R 0
> 1643 ?? I 0:02.66 /usr/local/sbin/httpd -DSSL
> 1644 ?? I 0:01.66 /usr/local/sbin/httpd -DSSL
> 1645 ?? I 0:01.60 /usr/local/sbin/httpd -DSSL
> 1646 ?? I 0:01.32 /usr/local/sbin/httpd -DSSL
> 1647 ?? I 0:01.70 /usr/local/sbin/httpd -DSSL
> 1648 ?? I 0:01.44 /usr/local/sbin/httpd -DSSL
> 1649 ?? I 0:01.61 /usr/local/sbin/httpd -DSSL
> 14122 ?? Z 0:00.06 <defunct>
> 27297 ?? Is 0:00.07 sshd: root@ttyp0 (sshd)
> 27472 ?? Is 0:00.01 /usr/bin/perl -wT
> /usr/testbed/sbin/frisbeelauncher 10035 (perl5.8.8)
> 28285 ?? Ss 0:00.11 sshd: root@ttyp1 (sshd)
> 28433 ?? S 0:05.91 /usr/testbed/sbin/frisbeed -i 192.168.0.14 -W
> 72000000 -K 15 -m 234.5.15.107 -p 7511
> /usr/testbed/images/FBSD63+FC8-STD.ndz
> 28434 ?? S 0:00.01 /usr/testbed/sbin/frisbeed -i 192.168.0.14 -W
> 72000000 -K 15 -m 234.5.15.107 -p 7511
> /usr/testbed/images/FBSD63+FC8-STD.ndz
> 28435 ?? S 0:00.29 /usr/testbed/sbin/frisbeed -i 192.168.0.14 -W
> 72000000 -K 15 -m 234.5.15.107 -p 7511
> /usr/testbed/images/FBSD63+FC8-STD.ndz
> 29124 ?? SL 0:00.00 [nfsiod 0]
> 1502 d0- S 0:54.98 /usr/bin/perl -wT
> /usr/testbed/sbin/reload_daemon (perl5.8.8)
> 1509 d0- S 0:11.61 /usr/bin/perl -wT
> /usr/testbed/sbin/checkup_daemon (perl5.8.8)
> 1578 d0 Is+ 0:00.00 /usr/libexec/getty std.115200 console
> 1579 v0 Is+ 0:00.00 /usr/libexec/getty Pc ttyv0
> 1580 v1 Is+ 0:00.00 /usr/libexec/getty Pc ttyv1
> 1581 v2 Is+ 0:00.00 /usr/libexec/getty Pc ttyv2
> 1582 v3 Is+ 0:00.00 /usr/libexec/getty Pc ttyv3
> 1583 v4 Is+ 0:00.00 /usr/libexec/getty Pc ttyv4
> 1584 v5 Is+ 0:00.00 /usr/libexec/getty Pc ttyv5
> 1585 v6 Is+ 0:00.00 /usr/libexec/getty Pc ttyv6
> 1586 v7 Is+ 0:00.00 /usr/libexec/getty Pc ttyv7
> 27302 p0 Is 0:00.01 -csh (csh)
> 28133 p0 I+ 0:00.06 ssh tips
> 28290 p1 Ss 0:00.03 -csh (csh)
> 29126 p1 R+ 0:00.00 ps axww
>
> -----Original Message-----
> From: Mike Hibler [mailto:mike@flux.utah.edu]
> Sent: Thursday, January 28, 2010 2:44 PM
> To: Korrie, Donna M CTR USAF AFMC AFRL/RYRD
> Cc: Mike Hibler; testbed-admins@flux.utah.edu; Leigh Stoller
> Subject: Re: [Testbed-admins] Nodes Stuck in reloading
>
> On Thu, Jan 28, 2010 at 02:22:22PM -0500, Korrie, Donna M CTR USAF AFMC
> AFRL/RYRD wrote:
> > ...
> > Do I need to restart anything?
> >
> >
>
> What does "ps axww" show? Just "ps" won't show all the processes.
> At the very least it seems like mysqld isn't running. There is supposed
> to
> be a watchdog running to make sure that mysqld is running and
> responding,
> but maybe it isn't running either.
>
> You may be best off just rebooting your boss, but let me see the ps info
> first.