tcp hickup

Discuss Networking
Post Reply
caveman
programmer
programmer
Posts: 130
Joined: Sun Feb 09, 2003 1:08 pm
Location: Midrand Gauteng, South Africa

tcp hickup

Post by caveman »

Hi guys

Have this process running on SUSE7.3 that connects to
another machine using Sockets. All written in "C".

All works fine - except every 50-60 hours the thing hangs
on the production machine and I cannot emulate the problem.

Checked open sockets, fd's etc.

Only indication I get are hundreds of entries in the log/message
file like the following.

May 4 17:04:19 router -- MARK --
May 4 17:24:19 router -- MARK --
May 4 17:44:19 router -- MARK --
May 4 18:04:19 router -- MARK --
May 4 18:24:19 router -- MARK --
..............
Jun 1 09:32:26 router -- MARK --
Jun 1 09:52:26 router -- MARK --
Jun 1 10:12:26 router -- MARK --
Jun 1 10:32:26 router -- MARK --
Jun 1 10:52:26 router -- MARK --

As you'll see they allways appear to be 20 minutes apart!
As soon as these entries appear the client program hangs.
Stopping the program and restart - it'll run for 2-3 days
and the same thing happens again.

As the production site is in the UK and I'm in South Africa
it is a problem to try and debug and not knowing when or
how it will fail.

My development setup is the same and I don't get anything like it. Been running it on and off with all sorts of setups for upto
a week without any hassles.

Any ideas? anybody?
Where can I start to look. Where can I get some indication
of what the messages mean.

I know this is little detail - any more info you need I'll
try and get it. :oops:

TX in advance.

PS. The system runs on a flash drive so there ain't space
for big debug files.

256MB flash drive and 256 MB memory!
(Yep - there were enough reasons for this setup and
they ask - we give!)

Edit.

Ok! also found this doing a netstat -a.
Am not sure how long it's been around - but shall follow-up.

tcp 1 0 router.tprod:33075 orcldataserver:4422 CLOSE_WAIT

caveman
programmer
programmer
Posts: 130
Joined: Sun Feb 09, 2003 1:08 pm
Location: Midrand Gauteng, South Africa

Post by caveman »

Ok guys - very sorry!!! :oops: :oops:

That was a wild goose chase. :roll:

The "-- MARK --" entry is put there by the syslogd and
the production machine - for some reason -
didn't start it with "syslogd -m 0". (major oversight).

It just happened that the problem that I got and the log
entries happened about the same time.

I think my problem is a socket that doesn't close properly.
Am busy investigating..

Tx anyways to whoever read this!

PS. Had to pull the source codes apart to find the "MARK"
entry - and all the while it is available with "man syslogd".

Edit....

Heh heh. Tried an Internet search with "-- MARK --"
as part of the search criteria.....
You'll never believe how many people in this world
is named "Mark" or variants thereof.

caveman
programmer
programmer
Posts: 130
Joined: Sun Feb 09, 2003 1:08 pm
Location: Midrand Gauteng, South Africa

Post by caveman »

And again. :D

Just for interest sake.

It seems the culprit was a shutdown(2) without a
close(2) on the client system. :shock:

- It actually happened when a *nix queue got full and
was a wee bit difficult to emulate -.

Will now have to run the production machine a while to
confirm.

Regards.

Post Reply