Loading...
 

PogamutUT2004


Graceful stopping

A while ago I asked a question about what the best way to terminate running bots/servers was. All bots have kill and stop methods, as do servers. I was wondering if things had been changed or made more elegant in Pogamut 3.1. Basically, I'm having some weird error which I think is related to this exception:

PogamutExceptioncz.cuni.amis.pogamut.ut2004.utils.MultipleUT2004BotRunner@340423: Could not execute all agents due to an exception, see logs of respective agents.
FatalErrorEvent[
at cz.cuni.amis.pogamut.base.agent.utils.runner.impl.MultipleAgentRunner.startAgentsMain(MultipleAgentRunner.java:339)
...

However, the error only occurs intermittently, and it is hard to track down in the midst of multiple bots and now multiple servers. It's a it hard for me to know how to troubleshoot my code when these various kill/stop commands sometimes result in errors being reported that I don't really care about.

So, how do I stop the bots/servers elegantly? Can I do it without throwing an Exception? Or is there an easy way to catch and handle Exceptions of the types I know I don't care about? The reason I ask this second question is that most of these Exceptions seem to be occurring within their own Threads, which are themselves deep within the Pogamut source code. I would rather not modify the Pogamut source too much to handle these Exceptions, but because these Exceptions occur in their own Threads, they never get propagated back into my code for me to deal with.
Hi!

Graceful stopping is not as easy thing as one would like. In Pogamut 2 there was even a possibility that the agent would not stop (or fail in a horrible way).
Current way is much better as it guaranteeing that the agent will actualy stop, ... always.

The biggest problem is that anybody may call stop/kill anytime. And then you will try to stop components in some order. So there is a possiblity that
one component will still work while the other one will be stopping. Thus if Mediator pass a message to a WorldView when WorldView is currently stopping,
it can't result in anything else than an exception. Nevertheless I might try to improve this mechanism, but currently it does not making much sense.

Perhaps it would be beneficial, if you describe several scenarios when bot.stop() results in an exception. I mean, it should not happen and
and I would consider such behavior as a bug.

Cheers!
Jimmy

P.S.: the worst case that can happen when bot is stopping is that internally it will raise "ComponentNotRunningException" but this exception should not bubble up to the bot.stop() method
because it is just inner-signal that the bot is stopping thus when some component receives it, it should just stop itself.
I think my problem was that I was using kill() instead of stop() and that I was using the IUT2004Server kill methods when bots were still running. Using stop instead does seem to result in fewer exceptions being propagated. I think that most of the time I see these exceptions they are harmless, since they happen when I'm intending the kill the bot anyway. I was just hoping I could avoid these exceptions to help with troubleshooting. You see, when you leave bots to evolve for a few hours and then come back to find that something has gone wrong, it can be hard to track down the real bug amidst lots of unimportant exceptions. Furthermore, it is also hard to know if the exception was caused by the bot stopping, or if the bot actually had an error which caused it to stop.

I still don't fully understand what was causing the PogamutExceptions I was getting, but I seem to have fixed my problem for the moment. If it acts up again, I'll be more specific in my followup.
I have just a small note. When there is some exception in my code that crash the bot I also get a bunch of AgentModules stopping messages with/without exceptions. To determine what actually did crash the agent I need to find the first exception that occurred in the log that is actually in my own code. So it is usually like this in the log:

1) exception in my code that caused agent to crash
2) a bunch of agent modules stopping, FatalErrors stuff

This is how I am finding what caused my agents to stop - not sure if it is applicable to your problem, but I guess it should be - if you have a complete log for each of the agents.

Best,
Michal
 

News

News RSS RSS feed for News link



Pogamut

Quarterly RSS RSS feed for quarterly reports

Acknowledgement

This work is supported by GA UK 1053/2007/A-INF/MFF (2007-8), GA UK 351/2006/A-INF/MFF (2006-8), the Ministry of Education of the Czech Republic (grant MSM0021620838) (2008-9), by the Program "Information Society" under project 1ET100300517 (2006-9), and the project Integration of IT Tools into Education of Humanities (2006-8) and by the project CZ.2.17/3.1.00/31162, which are financed by the European Social Fund, the state budget of the Czech Republic, and by the budget of Municipal House Prague.