[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Testbed-admins] Nodes Stuck in reloading



I rebooted boss yesterday and ops today...that did not seem to help.
Should I reboot boss again?

[root@boss:/usr/testbed/log](2:43pm)#ps axww
  PID  TT  STAT      TIME COMMAND
    0  ??  WLs    0:00.00 [swapper]
    1  ??  ILs    0:00.01 /sbin/init --
    2  ??  DL     0:03.16 [g_event]
    3  ??  DL     0:04.92 [g_up]
    4  ??  DL     0:05.53 [g_down]
    5  ??  DL     0:00.00 [thread taskq]
    6  ??  DL     0:00.00 [kqueue taskq]
    7  ??  DL     0:00.00 [acpi_task_0]
    8  ??  DL     0:00.00 [acpi_task_1]
    9  ??  DL     0:00.00 [acpi_task_2]
   10  ??  RL   2838:03.75 [idle]
   11  ??  WL     2:06.87 [swi4: clock sio]
   12  ??  WL     0:00.00 [swi3: vm]
   13  ??  WL     0:04.19 [swi1: net]
   14  ??  DL     0:03.60 [yarrow]
   15  ??  WL     0:00.00 [swi6: Giant taskq]
   16  ??  WL     0:00.00 [swi5: +]
   17  ??  DL     0:00.00 [xpt_thrd]
   18  ??  WL     0:00.00 [swi2: cambio]
   19  ??  WL     0:00.02 [swi6: task queue]
   20  ??  WL     0:00.00 [irq9: acpi0]
   21  ??  WL     0:13.63 [irq16: bce0 em1++]
   22  ??  WL     0:00.00 [irq19: em0]
   23  ??  WL     0:00.00 [irq17: em2]
   24  ??  WL     0:00.00 [irq18: em3]
   25  ??  WL     0:00.00 [irq21: uhci0 uhci+]
   26  ??  DL     0:00.01 [usb0]
   27  ??  DL     0:00.00 [usbtask]
   28  ??  WL     0:03.44 [irq20: uhci1]
   29  ??  DL     0:00.01 [usb1]
   30  ??  DL     0:00.01 [usb2]
   31  ??  DL     0:00.01 [usb3]
   32  ??  WL     0:00.00 [irq23: atapci0]
   33  ??  WL     0:00.01 [swi0: sio]
   34  ??  WL     0:00.00 [irq14: ata0]
   35  ??  WL     0:00.00 [irq15: ata1]
   36  ??  WL     0:00.00 [irq1: atkbd0]
   37  ??  DL     0:00.14 [pagedaemon]
   38  ??  DL     0:00.00 [vmdaemon]
   39  ??  DL     0:11.51 [pagezero]
   40  ??  DL     0:00.37 [bufdaemon]
   41  ??  DL     0:00.41 [vnlru]
   42  ??  DL     1:39.88 [syncer]
   43  ??  DL     0:00.78 [softdepflush]
   44  ??  DL     0:04.03 [schedcpu]
  135  ??  Is     0:00.00 adjkerntz -i
  764  ??  Is     0:00.00 /usr/sbin/moused -p /dev/ums0 -t auto -I
/var/run/moused.ums0.pid
  821  ??  Is     0:00.00 /sbin/devd
  918  ??  Ss     0:02.06 /usr/sbin/syslogd
  929  ??  Ss     0:00.83 /usr/sbin/named -u root
 1080  ??  Ss     0:00.08 /usr/sbin/rpcbind
 1208  ??  Is     0:00.01 nfsd: master (nfsd)
 1210  ??  I      0:00.00 nfsd: server (nfsd)
 1211  ??  I      0:00.00 nfsd: server (nfsd)
 1212  ??  I      0:00.00 nfsd: server (nfsd)
 1213  ??  I      0:00.00 nfsd: server (nfsd)
 1214  ??  I      0:00.00 nfsd: server (nfsd)
 1215  ??  I      0:00.00 nfsd: server (nfsd)
 1216  ??  I      0:00.00 nfsd: server (nfsd)
 1217  ??  I      0:00.00 nfsd: server (nfsd)
 1218  ??  I      0:00.00 nfsd: server (nfsd)
 1219  ??  I      0:00.00 nfsd: server (nfsd)
 1220  ??  I      0:00.00 nfsd: server (nfsd)
 1221  ??  I      0:00.00 nfsd: server (nfsd)
 1222  ??  I      0:00.00 nfsd: server (nfsd)
 1223  ??  I      0:00.00 nfsd: server (nfsd)
 1224  ??  I      0:00.00 nfsd: server (nfsd)
 1225  ??  I      0:00.00 nfsd: server (nfsd)
 1242  ??  Is     0:00.00 [sh]
 1313  ??  S      3:11.29 [mysqld]
 1344  ??  Ss     0:01.83 /usr/sbin/ntpd -c /etc/ntp.conf -p
/var/run/ntpd.pid -f /var/db/ntpd.drift
 1364  ??  Ss     0:00.11 /usr/sbin/usbd
 1371  ??  Ss     0:02.01 /usr/local/sbin/httpd -DSSL
 1379  ??  Ss     0:00.80 /usr/local/libexec/pubsubd
 1392  ??  Is     0:00.00 /usr/sbin/sshd
 1398  ??  Ss     0:01.53 sendmail: accepting connections (sendmail)
 1402  ??  Is     0:00.03 sendmail: Queue runner@00:30:00 for
/var/spool/clientmqueue (sendmail)
 1408  ??  Is     0:00.32 /usr/sbin/cron -s
 1423  ??  Is     0:00.00 /usr/bin/perl -w
/usr/testbed/sbin/daemon_wrapper /usr/local/sbin/dhcpd -f bce0
(perl5.8.8)
 1425  ??  S      0:01.16 /usr/local/sbin/dhcpd -f bce0
 1426  ??  I      0:01.84 /usr/local/sbin/httpd -DSSL
 1427  ??  I      0:01.61 /usr/local/sbin/httpd -DSSL
 1428  ??  I      0:01.52 /usr/local/sbin/httpd -DSSL
 1429  ??  I      0:01.79 /usr/local/sbin/httpd -DSSL
 1430  ??  I      0:01.57 /usr/local/sbin/httpd -DSSL
 1431  ??  I      0:01.94 /usr/local/sbin/httpd -DSSL
 1432  ??  I      0:01.68 /usr/local/sbin/httpd -DSSL
 1433  ??  I      0:01.60 /usr/local/sbin/httpd -DSSL
 1434  ??  I      0:02.22 /usr/local/sbin/httpd -DSSL
 1435  ??  I      0:01.52 /usr/local/sbin/httpd -DSSL
 1436  ??  I      0:01.52 /usr/local/sbin/httpd -DSSL
 1437  ??  I      0:01.90 /usr/local/sbin/httpd -DSSL
 1438  ??  I      0:01.38 /usr/local/sbin/httpd -DSSL
 1439  ??  I      0:02.04 /usr/local/sbin/httpd -DSSL
 1440  ??  I      0:02.17 /usr/local/sbin/httpd -DSSL
 1441  ??  I      0:01.44 /usr/local/sbin/httpd -DSSL
 1442  ??  I      0:02.39 /usr/local/sbin/httpd -DSSL
 1443  ??  I      0:01.59 /usr/local/sbin/httpd -DSSL
 1444  ??  I      0:02.05 /usr/local/sbin/httpd -DSSL
 1445  ??  S      0:01.68 /usr/local/sbin/httpd -DSSL
 1446  ??  I      0:01.46 /usr/local/sbin/httpd -DSSL
 1447  ??  I      0:02.48 /usr/local/sbin/httpd -DSSL
 1448  ??  I      0:01.64 /usr/local/sbin/httpd -DSSL
 1449  ??  I      0:01.48 /usr/local/sbin/httpd -DSSL
 1450  ??  I      0:01.74 /usr/local/sbin/httpd -DSSL
 1451  ??  I      0:01.66 /usr/local/sbin/httpd -DSSL
 1452  ??  I      0:01.49 /usr/local/sbin/httpd -DSSL
 1453  ??  I      0:01.26 /usr/local/sbin/httpd -DSSL
 1454  ??  I      0:01.72 /usr/local/sbin/httpd -DSSL
 1455  ??  I      0:01.79 /usr/local/sbin/httpd -DSSL
 1463  ??  Is     0:00.01 /usr/testbed/sbin/bootinfo
 1467  ??  Is     0:00.00 /usr/testbed/sbin/tmcd -i 192.168.0.14
 1469  ??  I      0:00.01 tmcd: UDP 7777: 23 done (tmcd)
 1470  ??  I      0:00.01 tmcd: UDP 7777: 23 done (tmcd)
 1471  ??  I      0:00.01 tmcd: UDP 7777: 23 done (tmcd)
 1472  ??  I      0:00.00 tmcd: UDP 14447: 0 done (tmcd)
 1473  ??  I      0:00.00 tmcd: TCP 14447: 0 done (tmcd)
 1474  ??  I      0:00.01 tmcd: TCP 7777: 18 done (tmcd)
 1475  ??  I      0:00.01 tmcd: TCP 7777: 18 done (tmcd)
 1476  ??  I      0:00.01 tmcd: TCP 7777: 18 done (tmcd)
 1477  ??  I      0:00.01 tmcd: TCP 7777: 18 done (tmcd)
 1478  ??  I      0:00.01 tmcd: TCP 7777: 17 done (tmcd)
 1479  ??  I      0:00.01 tmcd: TCP 7777: 17 done (tmcd)
 1480  ??  I      0:00.01 tmcd: TCP 7777: 17 done (tmcd)
 1481  ??  I      0:00.01 tmcd: TCP 7777: 17 done (tmcd)
 1482  ??  Ss     0:01.12 /usr/testbed/sbin/capserver
 1484  ??  Is     0:00.62 /usr/bin/perl -wT
/usr/testbed/sbin/lastlog_daemon (perl5.8.8)
 1490  ??  Is     0:00.00 /usr/testbed/sbin/sdcollectd
 1492  ??  Is     0:00.65 /usr/testbed/sbin/stated (perl5.8.8)
 1499  ??  Is     0:00.04 /usr/local/bin/python
/usr/testbed/sbin/sslxmlrpc_server.py
 1515  ??  Ss     0:10.17 /usr/bin/perl -w
/usr/testbed/sbin/mysqld_watchdog (perl5.8.8)
 1524  ??  Is     0:00.00 /usr/bin/perl -w
/usr/testbed/sbin/daemon_wrapper -i 30 -l /usr/testbed/log/batchlog
/usr/testbed/sbin/batch_daemon -d (perl5.8.8)
 1531  ??  S      0:07.73 /usr/bin/perl -wT
/usr/testbed/sbin/batch_daemon -d (perl5.8.8)
 1540  ??  Is     0:00.02 /usr/local/libexec/tftpd -m
/usr/local/etc/tftpd.rules -lvvvv -C 40 -s /tftpboot
 1560  ??  Is     0:00.00 /usr/sbin/inetd -wW -R 0
 1643  ??  I      0:02.66 /usr/local/sbin/httpd -DSSL
 1644  ??  I      0:01.66 /usr/local/sbin/httpd -DSSL
 1645  ??  I      0:01.60 /usr/local/sbin/httpd -DSSL
 1646  ??  I      0:01.32 /usr/local/sbin/httpd -DSSL
 1647  ??  I      0:01.70 /usr/local/sbin/httpd -DSSL
 1648  ??  I      0:01.44 /usr/local/sbin/httpd -DSSL
 1649  ??  I      0:01.61 /usr/local/sbin/httpd -DSSL
14122  ??  Z      0:00.06 <defunct>
27297  ??  Is     0:00.07 sshd: root@ttyp0 (sshd)
27472  ??  Is     0:00.01 /usr/bin/perl -wT
/usr/testbed/sbin/frisbeelauncher 10035 (perl5.8.8)
28285  ??  Ss     0:00.11 sshd: root@ttyp1 (sshd)
28433  ??  S      0:05.91 /usr/testbed/sbin/frisbeed -i 192.168.0.14 -W
72000000 -K 15 -m 234.5.15.107 -p 7511
/usr/testbed/images/FBSD63+FC8-STD.ndz
28434  ??  S      0:00.01 /usr/testbed/sbin/frisbeed -i 192.168.0.14 -W
72000000 -K 15 -m 234.5.15.107 -p 7511
/usr/testbed/images/FBSD63+FC8-STD.ndz
28435  ??  S      0:00.29 /usr/testbed/sbin/frisbeed -i 192.168.0.14 -W
72000000 -K 15 -m 234.5.15.107 -p 7511
/usr/testbed/images/FBSD63+FC8-STD.ndz
29124  ??  SL     0:00.00 [nfsiod 0]
 1502  d0- S      0:54.98 /usr/bin/perl -wT
/usr/testbed/sbin/reload_daemon (perl5.8.8)
 1509  d0- S      0:11.61 /usr/bin/perl -wT
/usr/testbed/sbin/checkup_daemon (perl5.8.8)
 1578  d0  Is+    0:00.00 /usr/libexec/getty std.115200 console
 1579  v0  Is+    0:00.00 /usr/libexec/getty Pc ttyv0
 1580  v1  Is+    0:00.00 /usr/libexec/getty Pc ttyv1
 1581  v2  Is+    0:00.00 /usr/libexec/getty Pc ttyv2
 1582  v3  Is+    0:00.00 /usr/libexec/getty Pc ttyv3
 1583  v4  Is+    0:00.00 /usr/libexec/getty Pc ttyv4
 1584  v5  Is+    0:00.00 /usr/libexec/getty Pc ttyv5
 1585  v6  Is+    0:00.00 /usr/libexec/getty Pc ttyv6
 1586  v7  Is+    0:00.00 /usr/libexec/getty Pc ttyv7
27302  p0  Is     0:00.01 -csh (csh)
28133  p0  I+     0:00.06 ssh tips
28290  p1  Ss     0:00.03 -csh (csh)
29126  p1  R+     0:00.00 ps axww

-----Original Message-----
From: Mike Hibler [mailto:mike@flux.utah.edu] 
Sent: Thursday, January 28, 2010 2:44 PM
To: Korrie, Donna M CTR USAF AFMC AFRL/RYRD
Cc: Mike Hibler; testbed-admins@flux.utah.edu; Leigh Stoller
Subject: Re: [Testbed-admins] Nodes Stuck in reloading

On Thu, Jan 28, 2010 at 02:22:22PM -0500, Korrie, Donna M CTR USAF AFMC
AFRL/RYRD wrote:
> ...
> Do I need to restart anything?
> 
> 

What does "ps axww" show?  Just "ps" won't show all the processes.
At the very least it seems like mysqld isn't running.  There is supposed
to
be a watchdog running to make sure that mysqld is running and
responding,
but maybe it isn't running either.

You may be best off just rebooting your boss, but let me see the ps info
first.