[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Testbed-admins] Nodes Stuck in reloading
I rebooted boss yesterday and ops today...that did not seem to help.
Should I reboot boss again?
[root@boss:/usr/testbed/log](2:43pm)#ps axww
PID TT STAT TIME COMMAND
0 ?? WLs 0:00.00 [swapper]
1 ?? ILs 0:00.01 /sbin/init --
2 ?? DL 0:03.16 [g_event]
3 ?? DL 0:04.92 [g_up]
4 ?? DL 0:05.53 [g_down]
5 ?? DL 0:00.00 [thread taskq]
6 ?? DL 0:00.00 [kqueue taskq]
7 ?? DL 0:00.00 [acpi_task_0]
8 ?? DL 0:00.00 [acpi_task_1]
9 ?? DL 0:00.00 [acpi_task_2]
10 ?? RL 2838:03.75 [idle]
11 ?? WL 2:06.87 [swi4: clock sio]
12 ?? WL 0:00.00 [swi3: vm]
13 ?? WL 0:04.19 [swi1: net]
14 ?? DL 0:03.60 [yarrow]
15 ?? WL 0:00.00 [swi6: Giant taskq]
16 ?? WL 0:00.00 [swi5: +]
17 ?? DL 0:00.00 [xpt_thrd]
18 ?? WL 0:00.00 [swi2: cambio]
19 ?? WL 0:00.02 [swi6: task queue]
20 ?? WL 0:00.00 [irq9: acpi0]
21 ?? WL 0:13.63 [irq16: bce0 em1++]
22 ?? WL 0:00.00 [irq19: em0]
23 ?? WL 0:00.00 [irq17: em2]
24 ?? WL 0:00.00 [irq18: em3]
25 ?? WL 0:00.00 [irq21: uhci0 uhci+]
26 ?? DL 0:00.01 [usb0]
27 ?? DL 0:00.00 [usbtask]
28 ?? WL 0:03.44 [irq20: uhci1]
29 ?? DL 0:00.01 [usb1]
30 ?? DL 0:00.01 [usb2]
31 ?? DL 0:00.01 [usb3]
32 ?? WL 0:00.00 [irq23: atapci0]
33 ?? WL 0:00.01 [swi0: sio]
34 ?? WL 0:00.00 [irq14: ata0]
35 ?? WL 0:00.00 [irq15: ata1]
36 ?? WL 0:00.00 [irq1: atkbd0]
37 ?? DL 0:00.14 [pagedaemon]
38 ?? DL 0:00.00 [vmdaemon]
39 ?? DL 0:11.51 [pagezero]
40 ?? DL 0:00.37 [bufdaemon]
41 ?? DL 0:00.41 [vnlru]
42 ?? DL 1:39.88 [syncer]
43 ?? DL 0:00.78 [softdepflush]
44 ?? DL 0:04.03 [schedcpu]
135 ?? Is 0:00.00 adjkerntz -i
764 ?? Is 0:00.00 /usr/sbin/moused -p /dev/ums0 -t auto -I
/var/run/moused.ums0.pid
821 ?? Is 0:00.00 /sbin/devd
918 ?? Ss 0:02.06 /usr/sbin/syslogd
929 ?? Ss 0:00.83 /usr/sbin/named -u root
1080 ?? Ss 0:00.08 /usr/sbin/rpcbind
1208 ?? Is 0:00.01 nfsd: master (nfsd)
1210 ?? I 0:00.00 nfsd: server (nfsd)
1211 ?? I 0:00.00 nfsd: server (nfsd)
1212 ?? I 0:00.00 nfsd: server (nfsd)
1213 ?? I 0:00.00 nfsd: server (nfsd)
1214 ?? I 0:00.00 nfsd: server (nfsd)
1215 ?? I 0:00.00 nfsd: server (nfsd)
1216 ?? I 0:00.00 nfsd: server (nfsd)
1217 ?? I 0:00.00 nfsd: server (nfsd)
1218 ?? I 0:00.00 nfsd: server (nfsd)
1219 ?? I 0:00.00 nfsd: server (nfsd)
1220 ?? I 0:00.00 nfsd: server (nfsd)
1221 ?? I 0:00.00 nfsd: server (nfsd)
1222 ?? I 0:00.00 nfsd: server (nfsd)
1223 ?? I 0:00.00 nfsd: server (nfsd)
1224 ?? I 0:00.00 nfsd: server (nfsd)
1225 ?? I 0:00.00 nfsd: server (nfsd)
1242 ?? Is 0:00.00 [sh]
1313 ?? S 3:11.29 [mysqld]
1344 ?? Ss 0:01.83 /usr/sbin/ntpd -c /etc/ntp.conf -p
/var/run/ntpd.pid -f /var/db/ntpd.drift
1364 ?? Ss 0:00.11 /usr/sbin/usbd
1371 ?? Ss 0:02.01 /usr/local/sbin/httpd -DSSL
1379 ?? Ss 0:00.80 /usr/local/libexec/pubsubd
1392 ?? Is 0:00.00 /usr/sbin/sshd
1398 ?? Ss 0:01.53 sendmail: accepting connections (sendmail)
1402 ?? Is 0:00.03 sendmail: Queue runner@00:30:00 for
/var/spool/clientmqueue (sendmail)
1408 ?? Is 0:00.32 /usr/sbin/cron -s
1423 ?? Is 0:00.00 /usr/bin/perl -w
/usr/testbed/sbin/daemon_wrapper /usr/local/sbin/dhcpd -f bce0
(perl5.8.8)
1425 ?? S 0:01.16 /usr/local/sbin/dhcpd -f bce0
1426 ?? I 0:01.84 /usr/local/sbin/httpd -DSSL
1427 ?? I 0:01.61 /usr/local/sbin/httpd -DSSL
1428 ?? I 0:01.52 /usr/local/sbin/httpd -DSSL
1429 ?? I 0:01.79 /usr/local/sbin/httpd -DSSL
1430 ?? I 0:01.57 /usr/local/sbin/httpd -DSSL
1431 ?? I 0:01.94 /usr/local/sbin/httpd -DSSL
1432 ?? I 0:01.68 /usr/local/sbin/httpd -DSSL
1433 ?? I 0:01.60 /usr/local/sbin/httpd -DSSL
1434 ?? I 0:02.22 /usr/local/sbin/httpd -DSSL
1435 ?? I 0:01.52 /usr/local/sbin/httpd -DSSL
1436 ?? I 0:01.52 /usr/local/sbin/httpd -DSSL
1437 ?? I 0:01.90 /usr/local/sbin/httpd -DSSL
1438 ?? I 0:01.38 /usr/local/sbin/httpd -DSSL
1439 ?? I 0:02.04 /usr/local/sbin/httpd -DSSL
1440 ?? I 0:02.17 /usr/local/sbin/httpd -DSSL
1441 ?? I 0:01.44 /usr/local/sbin/httpd -DSSL
1442 ?? I 0:02.39 /usr/local/sbin/httpd -DSSL
1443 ?? I 0:01.59 /usr/local/sbin/httpd -DSSL
1444 ?? I 0:02.05 /usr/local/sbin/httpd -DSSL
1445 ?? S 0:01.68 /usr/local/sbin/httpd -DSSL
1446 ?? I 0:01.46 /usr/local/sbin/httpd -DSSL
1447 ?? I 0:02.48 /usr/local/sbin/httpd -DSSL
1448 ?? I 0:01.64 /usr/local/sbin/httpd -DSSL
1449 ?? I 0:01.48 /usr/local/sbin/httpd -DSSL
1450 ?? I 0:01.74 /usr/local/sbin/httpd -DSSL
1451 ?? I 0:01.66 /usr/local/sbin/httpd -DSSL
1452 ?? I 0:01.49 /usr/local/sbin/httpd -DSSL
1453 ?? I 0:01.26 /usr/local/sbin/httpd -DSSL
1454 ?? I 0:01.72 /usr/local/sbin/httpd -DSSL
1455 ?? I 0:01.79 /usr/local/sbin/httpd -DSSL
1463 ?? Is 0:00.01 /usr/testbed/sbin/bootinfo
1467 ?? Is 0:00.00 /usr/testbed/sbin/tmcd -i 192.168.0.14
1469 ?? I 0:00.01 tmcd: UDP 7777: 23 done (tmcd)
1470 ?? I 0:00.01 tmcd: UDP 7777: 23 done (tmcd)
1471 ?? I 0:00.01 tmcd: UDP 7777: 23 done (tmcd)
1472 ?? I 0:00.00 tmcd: UDP 14447: 0 done (tmcd)
1473 ?? I 0:00.00 tmcd: TCP 14447: 0 done (tmcd)
1474 ?? I 0:00.01 tmcd: TCP 7777: 18 done (tmcd)
1475 ?? I 0:00.01 tmcd: TCP 7777: 18 done (tmcd)
1476 ?? I 0:00.01 tmcd: TCP 7777: 18 done (tmcd)
1477 ?? I 0:00.01 tmcd: TCP 7777: 18 done (tmcd)
1478 ?? I 0:00.01 tmcd: TCP 7777: 17 done (tmcd)
1479 ?? I 0:00.01 tmcd: TCP 7777: 17 done (tmcd)
1480 ?? I 0:00.01 tmcd: TCP 7777: 17 done (tmcd)
1481 ?? I 0:00.01 tmcd: TCP 7777: 17 done (tmcd)
1482 ?? Ss 0:01.12 /usr/testbed/sbin/capserver
1484 ?? Is 0:00.62 /usr/bin/perl -wT
/usr/testbed/sbin/lastlog_daemon (perl5.8.8)
1490 ?? Is 0:00.00 /usr/testbed/sbin/sdcollectd
1492 ?? Is 0:00.65 /usr/testbed/sbin/stated (perl5.8.8)
1499 ?? Is 0:00.04 /usr/local/bin/python
/usr/testbed/sbin/sslxmlrpc_server.py
1515 ?? Ss 0:10.17 /usr/bin/perl -w
/usr/testbed/sbin/mysqld_watchdog (perl5.8.8)
1524 ?? Is 0:00.00 /usr/bin/perl -w
/usr/testbed/sbin/daemon_wrapper -i 30 -l /usr/testbed/log/batchlog
/usr/testbed/sbin/batch_daemon -d (perl5.8.8)
1531 ?? S 0:07.73 /usr/bin/perl -wT
/usr/testbed/sbin/batch_daemon -d (perl5.8.8)
1540 ?? Is 0:00.02 /usr/local/libexec/tftpd -m
/usr/local/etc/tftpd.rules -lvvvv -C 40 -s /tftpboot
1560 ?? Is 0:00.00 /usr/sbin/inetd -wW -R 0
1643 ?? I 0:02.66 /usr/local/sbin/httpd -DSSL
1644 ?? I 0:01.66 /usr/local/sbin/httpd -DSSL
1645 ?? I 0:01.60 /usr/local/sbin/httpd -DSSL
1646 ?? I 0:01.32 /usr/local/sbin/httpd -DSSL
1647 ?? I 0:01.70 /usr/local/sbin/httpd -DSSL
1648 ?? I 0:01.44 /usr/local/sbin/httpd -DSSL
1649 ?? I 0:01.61 /usr/local/sbin/httpd -DSSL
14122 ?? Z 0:00.06 <defunct>
27297 ?? Is 0:00.07 sshd: root@ttyp0 (sshd)
27472 ?? Is 0:00.01 /usr/bin/perl -wT
/usr/testbed/sbin/frisbeelauncher 10035 (perl5.8.8)
28285 ?? Ss 0:00.11 sshd: root@ttyp1 (sshd)
28433 ?? S 0:05.91 /usr/testbed/sbin/frisbeed -i 192.168.0.14 -W
72000000 -K 15 -m 234.5.15.107 -p 7511
/usr/testbed/images/FBSD63+FC8-STD.ndz
28434 ?? S 0:00.01 /usr/testbed/sbin/frisbeed -i 192.168.0.14 -W
72000000 -K 15 -m 234.5.15.107 -p 7511
/usr/testbed/images/FBSD63+FC8-STD.ndz
28435 ?? S 0:00.29 /usr/testbed/sbin/frisbeed -i 192.168.0.14 -W
72000000 -K 15 -m 234.5.15.107 -p 7511
/usr/testbed/images/FBSD63+FC8-STD.ndz
29124 ?? SL 0:00.00 [nfsiod 0]
1502 d0- S 0:54.98 /usr/bin/perl -wT
/usr/testbed/sbin/reload_daemon (perl5.8.8)
1509 d0- S 0:11.61 /usr/bin/perl -wT
/usr/testbed/sbin/checkup_daemon (perl5.8.8)
1578 d0 Is+ 0:00.00 /usr/libexec/getty std.115200 console
1579 v0 Is+ 0:00.00 /usr/libexec/getty Pc ttyv0
1580 v1 Is+ 0:00.00 /usr/libexec/getty Pc ttyv1
1581 v2 Is+ 0:00.00 /usr/libexec/getty Pc ttyv2
1582 v3 Is+ 0:00.00 /usr/libexec/getty Pc ttyv3
1583 v4 Is+ 0:00.00 /usr/libexec/getty Pc ttyv4
1584 v5 Is+ 0:00.00 /usr/libexec/getty Pc ttyv5
1585 v6 Is+ 0:00.00 /usr/libexec/getty Pc ttyv6
1586 v7 Is+ 0:00.00 /usr/libexec/getty Pc ttyv7
27302 p0 Is 0:00.01 -csh (csh)
28133 p0 I+ 0:00.06 ssh tips
28290 p1 Ss 0:00.03 -csh (csh)
29126 p1 R+ 0:00.00 ps axww
-----Original Message-----
From: Mike Hibler [mailto:mike@flux.utah.edu]
Sent: Thursday, January 28, 2010 2:44 PM
To: Korrie, Donna M CTR USAF AFMC AFRL/RYRD
Cc: Mike Hibler; testbed-admins@flux.utah.edu; Leigh Stoller
Subject: Re: [Testbed-admins] Nodes Stuck in reloading
On Thu, Jan 28, 2010 at 02:22:22PM -0500, Korrie, Donna M CTR USAF AFMC
AFRL/RYRD wrote:
> ...
> Do I need to restart anything?
>
>
What does "ps axww" show? Just "ps" won't show all the processes.
At the very least it seems like mysqld isn't running. There is supposed
to
be a watchdog running to make sure that mysqld is running and
responding,
but maybe it isn't running either.
You may be best off just rebooting your boss, but let me see the ps info
first.