[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Testbed-admins] Nodes Stuck in reloading



Title: Re: [Testbed-admins] Nodes Stuck in reloading
I can get into mysql
I have a rmouted.conf in /etc
#
# Taken from Utah:
#
# this is the "other" interface
# Do everything we can to stop traffic on it
# We cannot just disable it or mrouted won't run
# (since there would only be a single active interface)
#
phyint 10.10.200.14 force_leaf passive deny 0/0
~
~


From: Mike Hibler [mailto:mike@flux.utah.edu]
Sent: Thu 1/28/2010 3:02 PM
To: Korrie, Donna M CTR USAF AFMC AFRL/RYRD
Cc: Mike Hibler; testbed-admins@flux.utah.edu; Leigh Stoller
Subject: Re: [Testbed-admins] Nodes Stuck in reloading

The frisbeelauncher messages may be a false alarm (i.e., they may be old),
it looks like everything is running okay.  Try connecting to mysql
interactively (on boss):

        mysql tbdb

and see if you get a mysql> command prompt.  You'll see in the ps listing
below that there are actually frisbeed's running (really just one with
multiple threads).  Compare the command line -m and -p info with what the
client thinks.

Maybe you did need to run "mrouted".  Do you have a mrouted.conf in either
/etc or /usr/local/etc on boss.

On Thu, Jan 28, 2010 at 02:47:07PM -0500, Korrie, Donna M CTR USAF AFMC AFRL/RYRD wrote:
> I rebooted boss yesterday and ops today...that did not seem to help.
> Should I reboot boss again?
>
> [root@boss:/usr/testbed/log](2:43pm)#ps axww
>   PID  TT  STAT      TIME COMMAND
>     0  ??  WLs    0:00.00 [swapper]
>     1  ??  ILs    0:00.01 /sbin/init --
>     2  ??  DL     0:03.16 [g_event]
>     3  ??  DL     0:04.92 [g_up]
>     4  ??  DL     0:05.53 [g_down]
>     5  ??  DL     0:00.00 [thread taskq]
>     6  ??  DL     0:00.00 [kqueue taskq]
>     7  ??  DL     0:00.00 [acpi_task_0]
>     8  ??  DL     0:00.00 [acpi_task_1]
>     9  ??  DL     0:00.00 [acpi_task_2]
>    10  ??  RL   2838:03.75 [idle]
>    11  ??  WL     2:06.87 [swi4: clock sio]
>    12  ??  WL     0:00.00 [swi3: vm]
>    13  ??  WL     0:04.19 [swi1: net]
>    14  ??  DL     0:03.60 [yarrow]
>    15  ??  WL     0:00.00 [swi6: Giant taskq]
>    16  ??  WL     0:00.00 [swi5: +]
>    17  ??  DL     0:00.00 [xpt_thrd]
>    18  ??  WL     0:00.00 [swi2: cambio]
>    19  ??  WL     0:00.02 [swi6: task queue]
>    20  ??  WL     0:00.00 [irq9: acpi0]
>    21  ??  WL     0:13.63 [irq16: bce0 em1++]
>    22  ??  WL     0:00.00 [irq19: em0]
>    23  ??  WL     0:00.00 [irq17: em2]
>    24  ??  WL     0:00.00 [irq18: em3]
>    25  ??  WL     0:00.00 [irq21: uhci0 uhci+]
>    26  ??  DL     0:00.01 [usb0]
>    27  ??  DL     0:00.00 [usbtask]
>    28  ??  WL     0:03.44 [irq20: uhci1]
>    29  ??  DL     0:00.01 [usb1]
>    30  ??  DL     0:00.01 [usb2]
>    31  ??  DL     0:00.01 [usb3]
>    32  ??  WL     0:00.00 [irq23: atapci0]
>    33  ??  WL     0:00.01 [swi0: sio]
>    34  ??  WL     0:00.00 [irq14: ata0]
>    35  ??  WL     0:00.00 [irq15: ata1]
>    36  ??  WL     0:00.00 [irq1: atkbd0]
>    37  ??  DL     0:00.14 [pagedaemon]
>    38  ??  DL     0:00.00 [vmdaemon]
>    39  ??  DL     0:11.51 [pagezero]
>    40  ??  DL     0:00.37 [bufdaemon]
>    41  ??  DL     0:00.41 [vnlru]
>    42  ??  DL     1:39.88 [syncer]
>    43  ??  DL     0:00.78 [softdepflush]
>    44  ??  DL     0:04.03 [schedcpu]
>   135  ??  Is     0:00.00 adjkerntz -i
>   764  ??  Is     0:00.00 /usr/sbin/moused -p /dev/ums0 -t auto -I
> /var/run/moused.ums0.pid
>   821  ??  Is     0:00.00 /sbin/devd
>   918  ??  Ss     0:02.06 /usr/sbin/syslogd
>   929  ??  Ss     0:00.83 /usr/sbin/named -u root
>  1080  ??  Ss     0:00.08 /usr/sbin/rpcbind
>  1208  ??  Is     0:00.01 nfsd: master (nfsd)
>  1210  ??  I      0:00.00 nfsd: server (nfsd)
>  1211  ??  I      0:00.00 nfsd: server (nfsd)
>  1212  ??  I      0:00.00 nfsd: server (nfsd)
>  1213  ??  I      0:00.00 nfsd: server (nfsd)
>  1214  ??  I      0:00.00 nfsd: server (nfsd)
>  1215  ??  I      0:00.00 nfsd: server (nfsd)
>  1216  ??  I      0:00.00 nfsd: server (nfsd)
>  1217  ??  I      0:00.00 nfsd: server (nfsd)
>  1218  ??  I      0:00.00 nfsd: server (nfsd)
>  1219  ??  I      0:00.00 nfsd: server (nfsd)
>  1220  ??  I      0:00.00 nfsd: server (nfsd)
>  1221  ??  I      0:00.00 nfsd: server (nfsd)
>  1222  ??  I      0:00.00 nfsd: server (nfsd)
>  1223  ??  I      0:00.00 nfsd: server (nfsd)
>  1224  ??  I      0:00.00 nfsd: server (nfsd)
>  1225  ??  I      0:00.00 nfsd: server (nfsd)
>  1242  ??  Is     0:00.00 [sh]
>  1313  ??  S      3:11.29 [mysqld]
>  1344  ??  Ss     0:01.83 /usr/sbin/ntpd -c /etc/ntp.conf -p
> /var/run/ntpd.pid -f /var/db/ntpd.drift
>  1364  ??  Ss     0:00.11 /usr/sbin/usbd
>  1371  ??  Ss     0:02.01 /usr/local/sbin/httpd -DSSL
>  1379  ??  Ss     0:00.80 /usr/local/libexec/pubsubd
>  1392  ??  Is     0:00.00 /usr/sbin/sshd
>  1398  ??  Ss     0:01.53 sendmail: accepting connections (sendmail)
>  1402  ??  Is     0:00.03 sendmail: Queue runner@00:30:00 for
> /var/spool/clientmqueue (sendmail)
>  1408  ??  Is     0:00.32 /usr/sbin/cron -s
>  1423  ??  Is     0:00.00 /usr/bin/perl -w
> /usr/testbed/sbin/daemon_wrapper /usr/local/sbin/dhcpd -f bce0
> (perl5.8.8)
>  1425  ??  S      0:01.16 /usr/local/sbin/dhcpd -f bce0
>  1426  ??  I      0:01.84 /usr/local/sbin/httpd -DSSL
>  1427  ??  I      0:01.61 /usr/local/sbin/httpd -DSSL
>  1428  ??  I      0:01.52 /usr/local/sbin/httpd -DSSL
>  1429  ??  I      0:01.79 /usr/local/sbin/httpd -DSSL
>  1430  ??  I      0:01.57 /usr/local/sbin/httpd -DSSL
>  1431  ??  I      0:01.94 /usr/local/sbin/httpd -DSSL
>  1432  ??  I      0:01.68 /usr/local/sbin/httpd -DSSL
>  1433  ??  I      0:01.60 /usr/local/sbin/httpd -DSSL
>  1434  ??  I      0:02.22 /usr/local/sbin/httpd -DSSL
>  1435  ??  I      0:01.52 /usr/local/sbin/httpd -DSSL
>  1436  ??  I      0:01.52 /usr/local/sbin/httpd -DSSL
>  1437  ??  I      0:01.90 /usr/local/sbin/httpd -DSSL
>  1438  ??  I      0:01.38 /usr/local/sbin/httpd -DSSL
>  1439  ??  I      0:02.04 /usr/local/sbin/httpd -DSSL
>  1440  ??  I      0:02.17 /usr/local/sbin/httpd -DSSL
>  1441  ??  I      0:01.44 /usr/local/sbin/httpd -DSSL
>  1442  ??  I      0:02.39 /usr/local/sbin/httpd -DSSL
>  1443  ??  I      0:01.59 /usr/local/sbin/httpd -DSSL
>  1444  ??  I      0:02.05 /usr/local/sbin/httpd -DSSL
>  1445  ??  S      0:01.68 /usr/local/sbin/httpd -DSSL
>  1446  ??  I      0:01.46 /usr/local/sbin/httpd -DSSL
>  1447  ??  I      0:02.48 /usr/local/sbin/httpd -DSSL
>  1448  ??  I      0:01.64 /usr/local/sbin/httpd -DSSL
>  1449  ??  I      0:01.48 /usr/local/sbin/httpd -DSSL
>  1450  ??  I      0:01.74 /usr/local/sbin/httpd -DSSL
>  1451  ??  I      0:01.66 /usr/local/sbin/httpd -DSSL
>  1452  ??  I      0:01.49 /usr/local/sbin/httpd -DSSL
>  1453  ??  I      0:01.26 /usr/local/sbin/httpd -DSSL
>  1454  ??  I      0:01.72 /usr/local/sbin/httpd -DSSL
>  1455  ??  I      0:01.79 /usr/local/sbin/httpd -DSSL
>  1463  ??  Is     0:00.01 /usr/testbed/sbin/bootinfo
>  1467  ??  Is     0:00.00 /usr/testbed/sbin/tmcd -i 192.168.0.14
>  1469  ??  I      0:00.01 tmcd: UDP 7777: 23 done (tmcd)
>  1470  ??  I      0:00.01 tmcd: UDP 7777: 23 done (tmcd)
>  1471  ??  I      0:00.01 tmcd: UDP 7777: 23 done (tmcd)
>  1472  ??  I      0:00.00 tmcd: UDP 14447: 0 done (tmcd)
>  1473  ??  I      0:00.00 tmcd: TCP 14447: 0 done (tmcd)
>  1474  ??  I      0:00.01 tmcd: TCP 7777: 18 done (tmcd)
>  1475  ??  I      0:00.01 tmcd: TCP 7777: 18 done (tmcd)
>  1476  ??  I      0:00.01 tmcd: TCP 7777: 18 done (tmcd)
>  1477  ??  I      0:00.01 tmcd: TCP 7777: 18 done (tmcd)
>  1478  ??  I      0:00.01 tmcd: TCP 7777: 17 done (tmcd)
>  1479  ??  I      0:00.01 tmcd: TCP 7777: 17 done (tmcd)
>  1480  ??  I      0:00.01 tmcd: TCP 7777: 17 done (tmcd)
>  1481  ??  I      0:00.01 tmcd: TCP 7777: 17 done (tmcd)
>  1482  ??  Ss     0:01.12 /usr/testbed/sbin/capserver
>  1484  ??  Is     0:00.62 /usr/bin/perl -wT
> /usr/testbed/sbin/lastlog_daemon (perl5.8.8)
>  1490  ??  Is     0:00.00 /usr/testbed/sbin/sdcollectd
>  1492  ??  Is     0:00.65 /usr/testbed/sbin/stated (perl5.8.8)
>  1499  ??  Is     0:00.04 /usr/local/bin/python
> /usr/testbed/sbin/sslxmlrpc_server.py
>  1515  ??  Ss     0:10.17 /usr/bin/perl -w
> /usr/testbed/sbin/mysqld_watchdog (perl5.8.8)
>  1524  ??  Is     0:00.00 /usr/bin/perl -w
> /usr/testbed/sbin/daemon_wrapper -i 30 -l /usr/testbed/log/batchlog
> /usr/testbed/sbin/batch_daemon -d (perl5.8.8)
>  1531  ??  S      0:07.73 /usr/bin/perl -wT
> /usr/testbed/sbin/batch_daemon -d (perl5.8.8)
>  1540  ??  Is     0:00.02 /usr/local/libexec/tftpd -m
> /usr/local/etc/tftpd.rules -lvvvv -C 40 -s /tftpboot
>  1560  ??  Is     0:00.00 /usr/sbin/inetd -wW -R 0
>  1643  ??  I      0:02.66 /usr/local/sbin/httpd -DSSL
>  1644  ??  I      0:01.66 /usr/local/sbin/httpd -DSSL
>  1645  ??  I      0:01.60 /usr/local/sbin/httpd -DSSL
>  1646  ??  I      0:01.32 /usr/local/sbin/httpd -DSSL
>  1647  ??  I      0:01.70 /usr/local/sbin/httpd -DSSL
>  1648  ??  I      0:01.44 /usr/local/sbin/httpd -DSSL
>  1649  ??  I      0:01.61 /usr/local/sbin/httpd -DSSL
> 14122  ??  Z      0:00.06 <defunct>
> 27297  ??  Is     0:00.07 sshd: root@ttyp0 (sshd)
> 27472  ??  Is     0:00.01 /usr/bin/perl -wT
> /usr/testbed/sbin/frisbeelauncher 10035 (perl5.8.8)
> 28285  ??  Ss     0:00.11 sshd: root@ttyp1 (sshd)
> 28433  ??  S      0:05.91 /usr/testbed/sbin/frisbeed -i 192.168.0.14 -W
> 72000000 -K 15 -m 234.5.15.107 -p 7511
> /usr/testbed/images/FBSD63+FC8-STD.ndz
> 28434  ??  S      0:00.01 /usr/testbed/sbin/frisbeed -i 192.168.0.14 -W
> 72000000 -K 15 -m 234.5.15.107 -p 7511
> /usr/testbed/images/FBSD63+FC8-STD.ndz
> 28435  ??  S      0:00.29 /usr/testbed/sbin/frisbeed -i 192.168.0.14 -W
> 72000000 -K 15 -m 234.5.15.107 -p 7511
> /usr/testbed/images/FBSD63+FC8-STD.ndz
> 29124  ??  SL     0:00.00 [nfsiod 0]
>  1502  d0- S      0:54.98 /usr/bin/perl -wT
> /usr/testbed/sbin/reload_daemon (perl5.8.8)
>  1509  d0- S      0:11.61 /usr/bin/perl -wT
> /usr/testbed/sbin/checkup_daemon (perl5.8.8)
>  1578  d0  Is+    0:00.00 /usr/libexec/getty std.115200 console
>  1579  v0  Is+    0:00.00 /usr/libexec/getty Pc ttyv0
>  1580  v1  Is+    0:00.00 /usr/libexec/getty Pc ttyv1
>  1581  v2  Is+    0:00.00 /usr/libexec/getty Pc ttyv2
>  1582  v3  Is+    0:00.00 /usr/libexec/getty Pc ttyv3
>  1583  v4  Is+    0:00.00 /usr/libexec/getty Pc ttyv4
>  1584  v5  Is+    0:00.00 /usr/libexec/getty Pc ttyv5
>  1585  v6  Is+    0:00.00 /usr/libexec/getty Pc ttyv6
>  1586  v7  Is+    0:00.00 /usr/libexec/getty Pc ttyv7
> 27302  p0  Is     0:00.01 -csh (csh)
> 28133  p0  I+     0:00.06 ssh tips
> 28290  p1  Ss     0:00.03 -csh (csh)
> 29126  p1  R+     0:00.00 ps axww
>
> -----Original Message-----
> From: Mike Hibler [mailto:mike@flux.utah.edu]
> Sent: Thursday, January 28, 2010 2:44 PM
> To: Korrie, Donna M CTR USAF AFMC AFRL/RYRD
> Cc: Mike Hibler; testbed-admins@flux.utah.edu; Leigh Stoller
> Subject: Re: [Testbed-admins] Nodes Stuck in reloading
>
> On Thu, Jan 28, 2010 at 02:22:22PM -0500, Korrie, Donna M CTR USAF AFMC
> AFRL/RYRD wrote:
> > ...
> > Do I need to restart anything?
> >
> >
>
> What does "ps axww" show?  Just "ps" won't show all the processes.
> At the very least it seems like mysqld isn't running.  There is supposed
> to
> be a watchdog running to make sure that mysqld is running and
> responding,
> but maybe it isn't running either.
>
> You may be best off just rebooting your boss, but let me see the ps info
> first.