[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Testbed-admins] could not connect to event server



On Tue, Jul 21, 2009 at 05:19:11PM -0400, Dongwoon Hahn wrote:
> I am trying to run a simple experiment with a single node. I couldn't
> make it work.
> 
> You can see the ns file from the log at the end of this e-mail.
> Some of the unique stuffs you may need to know are :
> - I didn't configure the power controller so I manually power cycle my machine.
> - We entered information about control and experimental switch into
> the database, but the switch is not actually used.
> (We will eventually use the wireless for this experiment.)
> - The node type is set with the following non-default values.
> (bios_waittime :300, rebootable : 0, power delay :300).
> 
> The experiment terminated while the node was in the reloading status
> playing frisbee.
> The major errors seen from the log file are as follows (It couldn't
> connect to event-server) :
> 
> pubsub_client_connect: Could not connect to event-server:16505
> event_register_withkeydata_withretry: could not connect to event server
> could not register with event system
> *** ERROR: tbswap: Failed to start event time.
> Cleaning up after errors.
> 
> *** ERROR: batchexp: tbswap in failed!
> Cleaning up and exiting with status 1 ...
> 
> **** Experimental information, please ignore ****
> Session ID = 4456
> Likely Cause of the Problem:
> ?Failed to start event time.
> ?...
> Cause: unknown
> Confidence: 0.7
> Script: tbswap
> **** End experimental information ****
> 
> My questions and requests are:
> 
> 1. Any comments or advices on errors observed above?
> 

Make sure that the event server ("pubsubd") is running on your "ops" machine:
	ps axw | grep pubsub
should show a process running.

Make sure that the DNS alias "event-server" resolves to your ops machine:
	ping event-server
from your boss machine should resolve to ops.

> 2. What are the meanings of power_delay, rebootable and bios waittime
> in the node type table?
> 

power_delay is how many seconds that must pass between attempts to power
cycle a machine.  This is just to prevent some runaway process from repeatedly
cycling power on a machine.

rebootable is an indicate as to whether a node type can or should be rebooted.
This is usually 1, except for types such as sensor nodes or maybe real routers
where power cycling is not possible or may otherwise cause problems.

bios_waittime is how long it should take a machine type to get through its
BIOS.  This value is taken into consideration when deciding whether a machine
is hung and should be power cycled or declared dead.

> 3. Does control or/and experimental switch have to be set up for this
> simple experiment? We won't need the experimental switch eventually
> since we are going to use only wireless for the experiment.
> 

The control switch has to be setup, since any node has to contact boss and
ops for various services.  That includes the event system, but I'm not sure
that is the problem above.  Your failure seems to be with a connection from
boss to ops and not from ops to a node.

The experimental switch is not required for a single node experiment or for
an experiment with no links/lans.