Have this process running on SUSE7.3 that connects to
another machine using Sockets. All written in "C".
All works fine - except every 50-60 hours the thing hangs
on the production machine and I cannot emulate the problem.
Checked open sockets, fd's etc.
Only indication I get are hundreds of entries in the log/message
file like the following.
May 4 17:04:19 router -- MARK --
May 4 17:24:19 router -- MARK --
May 4 17:44:19 router -- MARK --
May 4 18:04:19 router -- MARK --
May 4 18:24:19 router -- MARK --
..............
Jun 1 09:32:26 router -- MARK --
Jun 1 09:52:26 router -- MARK --
Jun 1 10:12:26 router -- MARK --
Jun 1 10:32:26 router -- MARK --
Jun 1 10:52:26 router -- MARK --
As you'll see they allways appear to be 20 minutes apart!
As soon as these entries appear the client program hangs.
Stopping the program and restart - it'll run for 2-3 days
and the same thing happens again.
As the production site is in the UK and I'm in South Africa
it is a problem to try and debug and not knowing when or
how it will fail.
My development setup is the same and I don't get anything like it. Been running it on and off with all sorts of setups for upto
a week without any hassles.
Any ideas? anybody?
Where can I start to look. Where can I get some indication
of what the messages mean.
I know this is little detail - any more info you need I'll
try and get it.
TX in advance.
PS. The system runs on a flash drive so there ain't space
for big debug files.
256MB flash drive and 256 MB memory!
(Yep - there were enough reasons for this setup and
they ask - we give!)
Edit.
Ok! also found this doing a netstat -a.
Am not sure how long it's been around - but shall follow-up.
tcp 1 0 router.tprod:33075 orcldataserver:4422 CLOSE_WAIT

