Karl Kleinpaste <karl,AT,charcoal,DOT,com>|
Sat, 19 Jan 2002 02:10:52 +0100|
RH7.1 + RH's 2.4.9-12 + CIPE 1.5.2.
About a year ago, I had some very peculiar problems with CIPE relative
to questions of MTU size. The default (1500?) had caused me troubles
when I began using my CIPE link as my default route, such that hosts
in my masqueraded network couldn't successfully make TCP connections
outside the home network. (Connections from the gateway host itself
were fine.) By sheer luck and happenstance, I stumbled on the fact
that shorter MTUs didn't help, but knocking MTU up to 4096 made
everything wonderful. Presumably, allowing fragmentation at lower
levels took care of whatever problem was keeping me from having
success otherwise at that time.
I'd been operating in this mode for, as I said, about a year.
This past Tuesday, around 8am local time, TCP connections stopped
working, whether from masqueraded hosts behind my gateway machine or
from the gateway machine itself.
I know this is when it occurred because that's when /var/log/maillog
began showing a lot of:
Jan 15 08:16:49 cinnamon sendmail: g0FD8Ig08680: collect:
premature EOM: Connection reset by n22.groups.yahoo.com
That's as perceived on the gateway host, on which CIPE runs. I
watched the sendmail processes stack up, with "netstat -t" showing
ESTABLISHED yet with no data really flowing.
I fiddled with things for a couple hours before tripping over the idea
of shortening the MTU. Oddly, it didn't help to reduce it back to the
default of 1500. Rather, I cut it back to 1400, at which point things
started working on the gateway machine, but now things don't work from
masqueraded hosts behind the gateway. Having gotten past the
immediate disaster, I let it go like that for the rest of the week.
Just a little while ago, by what amounts to a binary search using
connection to an IRC server as a test case and watching which
connections worked and which didn't, I've determined that the maximum
MTU that I can now use is 1459.
No kidding, 1459. When I "ifconfig cipcb0 mtu 1460" or higher, TCP
connections start failing. Now, not all TCP connections fail. Some
work just fine, and I don't mean merely because some connections use a
lot of short packets. No, some work fine because I can e.g. connect
to different IRC servers, such as one or two in DalNet, and they
always work fine, no matter what MTU is in use, at least from here on
the CIPE-running gateway machine itself.
These configurations hadn't changed in a year. I don't even have root
access to the remote endpoint any more -- if that machine ever
reboots, I will have a bit of trouble as I hunt down the right
sysadmin types to get that end started again. So I *can't* have
changed the configuration, yet 8am Tuesday was when this change to a
failure mode occurred.
Is it possible that I'm up against some drastic router change in the
path between the two endpoints, such that the aggregate path is no
longer accepting large packets passing through?
The carrier interface is ppp0 at this end, with an ordinary MTU of
1500. Might it be useful to toy with ppp0's MTU, to see how it
interacts with varying cipcb0's as well?
Any other diagnostic thoughts would be welcome.