[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Testbed-admins] Nodes Stuck in reloading



Mike,

Same results after I restarted mrouted on BOSS.



-----Original Message-----
From: Mike Hibler [mailto:mike@flux.utah.edu] 
Sent: Thursday, January 28, 2010 3:31 PM
To: Korrie, Donna M CTR USAF AFMC AFRL/RYRD
Cc: Mike Hibler; testbed-admins@flux.utah.edu; Leigh Stoller
Subject: Re: [Testbed-admins] Nodes Stuck in reloading

Ah, maybe your mrouted died.  Run "/etc/rc.d/mrouted restart" and see if
things get better.

On Thu, Jan 28, 2010 at 03:06:32PM -0500, Korrie, Donna M CTR USAF AFMC
AFRL/RYRD wrote:
> I can get into mysql
> I have a rmouted.conf in /etc
> #
> # Taken from Utah:
> #
> # this is the "other" interface
> # Do everything we can to stop traffic on it
> # We cannot just disable it or mrouted won't run
> # (since there would only be a single active interface)
> #
> phyint 10.10.200.14 force_leaf passive deny 0/0
> ~
> ~
> 
> 
> ________________________________
> 
> From: Mike Hibler [mailto:mike@flux.utah.edu]
> Sent: Thu 1/28/2010 3:02 PM
> To: Korrie, Donna M CTR USAF AFMC AFRL/RYRD
> Cc: Mike Hibler; testbed-admins@flux.utah.edu; Leigh Stoller
> Subject: Re: [Testbed-admins] Nodes Stuck in reloading
> 
> 
> 
> The frisbeelauncher messages may be a false alarm (i.e., they may be
old),
> it looks like everything is running okay.  Try connecting to mysql
> interactively (on boss):
> 
>         mysql tbdb
> 
> and see if you get a mysql> command prompt.  You'll see in the ps
listing
> below that there are actually frisbeed's running (really just one with
> multiple threads).  Compare the command line -m and -p info with what
the
> client thinks.
> 
> Maybe you did need to run "mrouted".  Do you have a mrouted.conf in
either
> /etc or /usr/local/etc on boss.
> 
> On Thu, Jan 28, 2010 at 02:47:07PM -0500, Korrie, Donna M CTR USAF
AFMC AFRL/RYRD wrote:
> > I rebooted boss yesterday and ops today...that did not seem to help.
> > Should I reboot boss again?
> >
> > [root@boss:/usr/testbed/log](2:43pm)#ps axww
> >   PID  TT  STAT      TIME COMMAND
> >     0  ??  WLs    0:00.00 [swapper]
> >     1  ??  ILs    0:00.01 /sbin/init --
> >     2  ??  DL     0:03.16 [g_event]
> >     3  ??  DL     0:04.92 [g_up]
> >     4  ??  DL     0:05.53 [g_down]
> >     5  ??  DL     0:00.00 [thread taskq]
> >     6  ??  DL     0:00.00 [kqueue taskq]
> >     7  ??  DL     0:00.00 [acpi_task_0]
> >     8  ??  DL     0:00.00 [acpi_task_1]
> >     9  ??  DL     0:00.00 [acpi_task_2]
> >    10  ??  RL   2838:03.75 [idle]
> >    11  ??  WL     2:06.87 [swi4: clock sio]
> >    12  ??  WL     0:00.00 [swi3: vm]
> >    13  ??  WL     0:04.19 [swi1: net]
> >    14  ??  DL     0:03.60 [yarrow]
> >    15  ??  WL     0:00.00 [swi6: Giant taskq]
> >    16  ??  WL     0:00.00 [swi5: +]
> >    17  ??  DL     0:00.00 [xpt_thrd]
> >    18  ??  WL     0:00.00 [swi2: cambio]
> >    19  ??  WL     0:00.02 [swi6: task queue]
> >    20  ??  WL     0:00.00 [irq9: acpi0]
> >    21  ??  WL     0:13.63 [irq16: bce0 em1++]
> >    22  ??  WL     0:00.00 [irq19: em0]
> >    23  ??  WL     0:00.00 [irq17: em2]
> >    24  ??  WL     0:00.00 [irq18: em3]
> >    25  ??  WL     0:00.00 [irq21: uhci0 uhci+]
> >    26  ??  DL     0:00.01 [usb0]
> >    27  ??  DL     0:00.00 [usbtask]
> >    28  ??  WL     0:03.44 [irq20: uhci1]
> >    29  ??  DL     0:00.01 [usb1]
> >    30  ??  DL     0:00.01 [usb2]
> >    31  ??  DL     0:00.01 [usb3]
> >    32  ??  WL     0:00.00 [irq23: atapci0]
> >    33  ??  WL     0:00.01 [swi0: sio]
> >    34  ??  WL     0:00.00 [irq14: ata0]
> >    35  ??  WL     0:00.00 [irq15: ata1]
> >    36  ??  WL     0:00.00 [irq1: atkbd0]
> >    37  ??  DL     0:00.14 [pagedaemon]
> >    38  ??  DL     0:00.00 [vmdaemon]
> >    39  ??  DL     0:11.51 [pagezero]
> >    40  ??  DL     0:00.37 [bufdaemon]
> >    41  ??  DL     0:00.41 [vnlru]
> >    42  ??  DL     1:39.88 [syncer]
> >    43  ??  DL     0:00.78 [softdepflush]
> >    44  ??  DL     0:04.03 [schedcpu]
> >   135  ??  Is     0:00.00 adjkerntz -i
> >   764  ??  Is     0:00.00 /usr/sbin/moused -p /dev/ums0 -t auto -I
> > /var/run/moused.ums0.pid
> >   821  ??  Is     0:00.00 /sbin/devd
> >   918  ??  Ss     0:02.06 /usr/sbin/syslogd
> >   929  ??  Ss     0:00.83 /usr/sbin/named -u root
> >  1080  ??  Ss     0:00.08 /usr/sbin/rpcbind
> >  1208  ??  Is     0:00.01 nfsd: master (nfsd)
> >  1210  ??  I      0:00.00 nfsd: server (nfsd)
> >  1211  ??  I      0:00.00 nfsd: server (nfsd)
> >  1212  ??  I      0:00.00 nfsd: server (nfsd)
> >  1213  ??  I      0:00.00 nfsd: server (nfsd)
> >  1214  ??  I      0:00.00 nfsd: server (nfsd)
> >  1215  ??  I      0:00.00 nfsd: server (nfsd)
> >  1216  ??  I      0:00.00 nfsd: server (nfsd)
> >  1217  ??  I      0:00.00 nfsd: server (nfsd)
> >  1218  ??  I      0:00.00 nfsd: server (nfsd)
> >  1219  ??  I      0:00.00 nfsd: server (nfsd)
> >  1220  ??  I      0:00.00 nfsd: server (nfsd)
> >  1221  ??  I      0:00.00 nfsd: server (nfsd)
> >  1222  ??  I      0:00.00 nfsd: server (nfsd)
> >  1223  ??  I      0:00.00 nfsd: server (nfsd)
> >  1224  ??  I      0:00.00 nfsd: server (nfsd)
> >  1225  ??  I      0:00.00 nfsd: server (nfsd)
> >  1242  ??  Is     0:00.00 [sh]
> >  1313  ??  S      3:11.29 [mysqld]
> >  1344  ??  Ss     0:01.83 /usr/sbin/ntpd -c /etc/ntp.conf -p
> > /var/run/ntpd.pid -f /var/db/ntpd.drift
> >  1364  ??  Ss     0:00.11 /usr/sbin/usbd
> >  1371  ??  Ss     0:02.01 /usr/local/sbin/httpd -DSSL
> >  1379  ??  Ss     0:00.80 /usr/local/libexec/pubsubd
> >  1392  ??  Is     0:00.00 /usr/sbin/sshd
> >  1398  ??  Ss     0:01.53 sendmail: accepting connections (sendmail)
> >  1402  ??  Is     0:00.03 sendmail: Queue runner@00:30:00 for
> > /var/spool/clientmqueue (sendmail)
> >  1408  ??  Is     0:00.32 /usr/sbin/cron -s
> >  1423  ??  Is     0:00.00 /usr/bin/perl -w
> > /usr/testbed/sbin/daemon_wrapper /usr/local/sbin/dhcpd -f bce0
> > (perl5.8.8)
> >  1425  ??  S      0:01.16 /usr/local/sbin/dhcpd -f bce0
> >  1426  ??  I      0:01.84 /usr/local/sbin/httpd -DSSL
> >  1427  ??  I      0:01.61 /usr/local/sbin/httpd -DSSL
> >  1428  ??  I      0:01.52 /usr/local/sbin/httpd -DSSL
> >  1429  ??  I      0:01.79 /usr/local/sbin/httpd -DSSL
> >  1430  ??  I      0:01.57 /usr/local/sbin/httpd -DSSL
> >  1431  ??  I      0:01.94 /usr/local/sbin/httpd -DSSL
> >  1432  ??  I      0:01.68 /usr/local/sbin/httpd -DSSL
> >  1433  ??  I      0:01.60 /usr/local/sbin/httpd -DSSL
> >  1434  ??  I      0:02.22 /usr/local/sbin/httpd -DSSL
> >  1435  ??  I      0:01.52 /usr/local/sbin/httpd -DSSL
> >  1436  ??  I      0:01.52 /usr/local/sbin/httpd -DSSL
> >  1437  ??  I      0:01.90 /usr/local/sbin/httpd -DSSL
> >  1438  ??  I      0:01.38 /usr/local/sbin/httpd -DSSL
> >  1439  ??  I      0:02.04 /usr/local/sbin/httpd -DSSL
> >  1440  ??  I      0:02.17 /usr/local/sbin/httpd -DSSL
> >  1441  ??  I      0:01.44 /usr/local/sbin/httpd -DSSL
> >  1442  ??  I      0:02.39 /usr/local/sbin/httpd -DSSL
> >  1443  ??  I      0:01.59 /usr/local/sbin/httpd -DSSL
> >  1444  ??  I      0:02.05 /usr/local/sbin/httpd -DSSL
> >  1445  ??  S      0:01.68 /usr/local/sbin/httpd -DSSL
> >  1446  ??  I      0:01.46 /usr/local/sbin/httpd -DSSL
> >  1447  ??  I      0:02.48 /usr/local/sbin/httpd -DSSL
> >  1448  ??  I      0:01.64 /usr/local/sbin/httpd -DSSL
> >  1449  ??  I      0:01.48 /usr/local/sbin/httpd -DSSL
> >  1450  ??  I      0:01.74 /usr/local/sbin/httpd -DSSL
> >  1451  ??  I      0:01.66 /usr/local/sbin/httpd -DSSL
> >  1452  ??  I      0:01.49 /usr/local/sbin/httpd -DSSL
> >  1453  ??  I      0:01.26 /usr/local/sbin/httpd -DSSL
> >  1454  ??  I      0:01.72 /usr/local/sbin/httpd -DSSL
> >  1455  ??  I      0:01.79 /usr/local/sbin/httpd -DSSL
> >  1463  ??  Is     0:00.01 /usr/testbed/sbin/bootinfo
> >  1467  ??  Is     0:00.00 /usr/testbed/sbin/tmcd -i 192.168.0.14
> >  1469  ??  I      0:00.01 tmcd: UDP 7777: 23 done (tmcd)
> >  1470  ??  I      0:00.01 tmcd: UDP 7777: 23 done (tmcd)
> >  1471  ??  I      0:00.01 tmcd: UDP 7777: 23 done (tmcd)
> >  1472  ??  I      0:00.00 tmcd: UDP 14447: 0 done (tmcd)
> >  1473  ??  I      0:00.00 tmcd: TCP 14447: 0 done (tmcd)
> >  1474  ??  I      0:00.01 tmcd: TCP 7777: 18 done (tmcd)
> >  1475  ??  I      0:00.01 tmcd: TCP 7777: 18 done (tmcd)
> >  1476  ??  I      0:00.01 tmcd: TCP 7777: 18 done (tmcd)
> >  1477  ??  I      0:00.01 tmcd: TCP 7777: 18 done (tmcd)
> >  1478  ??  I      0:00.01 tmcd: TCP 7777: 17 done (tmcd)
> >  1479  ??  I      0:00.01 tmcd: TCP 7777: 17 done (tmcd)
> >  1480  ??  I      0:00.01 tmcd: TCP 7777: 17 done (tmcd)
> >  1481  ??  I      0:00.01 tmcd: TCP 7777: 17 done (tmcd)
> >  1482  ??  Ss     0:01.12 /usr/testbed/sbin/capserver
> >  1484  ??  Is     0:00.62 /usr/bin/perl -wT
> > /usr/testbed/sbin/lastlog_daemon (perl5.8.8)
> >  1490  ??  Is     0:00.00 /usr/testbed/sbin/sdcollectd
> >  1492  ??  Is     0:00.65 /usr/testbed/sbin/stated (perl5.8.8)
> >  1499  ??  Is     0:00.04 /usr/local/bin/python
> > /usr/testbed/sbin/sslxmlrpc_server.py
> >  1515  ??  Ss     0:10.17 /usr/bin/perl -w
> > /usr/testbed/sbin/mysqld_watchdog (perl5.8.8)
> >  1524  ??  Is     0:00.00 /usr/bin/perl -w
> > /usr/testbed/sbin/daemon_wrapper -i 30 -l /usr/testbed/log/batchlog
> > /usr/testbed/sbin/batch_daemon -d (perl5.8.8)
> >  1531  ??  S      0:07.73 /usr/bin/perl -wT
> > /usr/testbed/sbin/batch_daemon -d (perl5.8.8)
> >  1540  ??  Is     0:00.02 /usr/local/libexec/tftpd -m
> > /usr/local/etc/tftpd.rules -lvvvv -C 40 -s /tftpboot
> >  1560  ??  Is     0:00.00 /usr/sbin/inetd -wW -R 0
> >  1643  ??  I      0:02.66 /usr/local/sbin/httpd -DSSL
> >  1644  ??  I      0:01.66 /usr/local/sbin/httpd -DSSL
> >  1645  ??  I      0:01.60 /usr/local/sbin/httpd -DSSL
> >  1646  ??  I      0:01.32 /usr/local/sbin/httpd -DSSL
> >  1647  ??  I      0:01.70 /usr/local/sbin/httpd -DSSL
> >  1648  ??  I      0:01.44 /usr/local/sbin/httpd -DSSL
> >  1649  ??  I      0:01.61 /usr/local/sbin/httpd -DSSL
> > 14122  ??  Z      0:00.06 <defunct>
> > 27297  ??  Is     0:00.07 sshd: root@ttyp0 (sshd)
> > 27472  ??  Is     0:00.01 /usr/bin/perl -wT
> > /usr/testbed/sbin/frisbeelauncher 10035 (perl5.8.8)
> > 28285  ??  Ss     0:00.11 sshd: root@ttyp1 (sshd)
> > 28433  ??  S      0:05.91 /usr/testbed/sbin/frisbeed -i 192.168.0.14
-W
> > 72000000 -K 15 -m 234.5.15.107 -p 7511
> > /usr/testbed/images/FBSD63+FC8-STD.ndz
> > 28434  ??  S      0:00.01 /usr/testbed/sbin/frisbeed -i 192.168.0.14
-W
> > 72000000 -K 15 -m 234.5.15.107 -p 7511
> > /usr/testbed/images/FBSD63+FC8-STD.ndz
> > 28435  ??  S      0:00.29 /usr/testbed/sbin/frisbeed -i 192.168.0.14
-W
> > 72000000 -K 15 -m 234.5.15.107 -p 7511
> > /usr/testbed/images/FBSD63+FC8-STD.ndz
> > 29124  ??  SL     0:00.00 [nfsiod 0]
> >  1502  d0- S      0:54.98 /usr/bin/perl -wT
> > /usr/testbed/sbin/reload_daemon (perl5.8.8)
> >  1509  d0- S      0:11.61 /usr/bin/perl -wT
> > /usr/testbed/sbin/checkup_daemon (perl5.8.8)
> >  1578  d0  Is+    0:00.00 /usr/libexec/getty std.115200 console
> >  1579  v0  Is+    0:00.00 /usr/libexec/getty Pc ttyv0
> >  1580  v1  Is+    0:00.00 /usr/libexec/getty Pc ttyv1
> >  1581  v2  Is+    0:00.00 /usr/libexec/getty Pc ttyv2
> >  1582  v3  Is+    0:00.00 /usr/libexec/getty Pc ttyv3
> >  1583  v4  Is+    0:00.00 /usr/libexec/getty Pc ttyv4
> >  1584  v5  Is+    0:00.00 /usr/libexec/getty Pc ttyv5
> >  1585  v6  Is+    0:00.00 /usr/libexec/getty Pc ttyv6
> >  1586  v7  Is+    0:00.00 /usr/libexec/getty Pc ttyv7
> > 27302  p0  Is     0:00.01 -csh (csh)
> > 28133  p0  I+     0:00.06 ssh tips
> > 28290  p1  Ss     0:00.03 -csh (csh)
> > 29126  p1  R+     0:00.00 ps axww
> >
> > -----Original Message-----
> > From: Mike Hibler [mailto:mike@flux.utah.edu]
> > Sent: Thursday, January 28, 2010 2:44 PM
> > To: Korrie, Donna M CTR USAF AFMC AFRL/RYRD
> > Cc: Mike Hibler; testbed-admins@flux.utah.edu; Leigh Stoller
> > Subject: Re: [Testbed-admins] Nodes Stuck in reloading
> >
> > On Thu, Jan 28, 2010 at 02:22:22PM -0500, Korrie, Donna M CTR USAF
AFMC
> > AFRL/RYRD wrote:
> > > ...
> > > Do I need to restart anything?
> > >
> > >
> >
> > What does "ps axww" show?  Just "ps" won't show all the processes.
> > At the very least it seems like mysqld isn't running.  There is
supposed
> > to
> > be a watchdog running to make sure that mysqld is running and
> > responding,
> > but maybe it isn't running either.
> >
> > You may be best off just rebooting your boss, but let me see the ps
info
> > first.
> 
>