<< | Thread Index | >> ]    [ << | Date Index | >> ]

Subject: Re: cipe-win32-pre9 fix
From: Erik Wallin <erikw,AT,sec,DOT,se>
Date: Thu, 10 May 2001 19:53:21 +0200
In-reply-to: <31276.988416041@www32.gmx.net>

Damion Wilson wrote:

> No. Wait. Stop. The spinlock stuff causes problems in the list management.
> I just finished yanking it all out because it was causing crashes due to
> invalid irql. The documentation implies (Microsoft docs, go figure) that
> the spinlocks should only REALLY be needed on SMP hardware. Please read the
> TODO.TXT to find out what I think are the issues.

Ok, I'll wait. What I don't understand is why this change makes such a huge
improvement as it does? Before the spinlocks were added, I couldn't even start
the service on some machines. Now it seems really stable. I haven't been able
to get it to blue screen at all since. Maybe it's only because of the slowdown
due to the spinlock code?

BTW. What does the STATE mean in Todo.txt? A date means you consider the issue
closed, and we don't have to worry about it?

> Where, exactly, is NdisFreeMemory freeing memory with different flags than
> it is being allocated with ? Right now, the memory being allocated for
> PacketBuffers is different than that for list nodes.

Packet buffers are allocated in AdapterTransmit with a direct call to
NdisAllocateMemory(...,0,...).

They are deallocated in DestroyTapDevice with a call to MemFree. MemFree uses
NdisFreeMemory(...,NDIS_MEMORY_NONCACHED | NDIS_MEMORY_CONTIGUOUS).

They are also deallocated in CompleteIRP and AdapterTransmit using
NdisFreeMemory(...,0).

Unless the tap device is destroyed they are allocated and deallocated using 
the
same flags. This might explain why the machine used to blow up in my face if I
tried to disable the device, both with and without the service running. Come 
to
think of it, this only happened if the service had been started or when any
packets had been sent down the interface.

I don't understand why we have to use NDIS_MEMORY_NONCACHED or
NDIS_MEMORY_CONTIGUOUS at all. The driver is not reading or writing stuff that
is going to be read by a separate chip, just different threads (possibly on
different processors). That should be handled by cache coherence logic.

> There are two queues: a packet queue and an IRP queue. The handling of all
> of it should be fairly synchronous. There, of course, are no guarantees on
> whether the code will be called from one thread or another, but in
> IRQL_DISPATCH, there should be no preemption anyway1. Read up on it in the
> DDK documentation (as crappy as it is).

Ok, I'll read the documentation and see if I understand this in detail.

Let me just first make sure I have grasped the design.

The tap device is emulating an ethernet device, right? At what level? The DDK
defines physical adapters, filters etc. What would you call this?

Is the ethernet device the TAP device? Or, is the TAP device the whole idea of
"tapping" packets from a virtual adapter to a user mode process?

What about the communication with the service? Is that simpy opening up a pipe
of some sort? Where do I find information about the IO-calls involved? I'll 
try
to understand this on my own, but if you have any pointers I would appreciate
it.

Once I understand it, I don't mind writing some kind of documentation if
necessary so others can understand the code easier too.

> I originally had an array for this stuff and moved to a linked list for
> flexibility. A friend has suggested using vectors, and considering the way
> the driver structure has unfolded, this would now be the best option.
> However, you would not notice any performance gain unless you were using
> gigabit ethernet and even then, the only way to get "real" performance gain
> would be to fold the usermode tunnel handling into kernel mode thus
> preventing the state changes (all the stuff that windows does to get
> to/from usermode) and the general slowness of a usermode (preemptable)
> process running at normal priority.

Ok. I'm convinced. Let's drop the array idea.

I guess a state change for every <queue length> packets is the minimum 
required
at a saturated connection. So a longer queue would solve the usermode/kernel
mode problem? Or is there a state change required for every buffer read? The
more I think about it the more likely it seems. The user mode code makes an 
I/O
read/write and that translates into a system call, especially with unbuffered
I/O. I understand why you started out with shared memory. Too bad it didn't
work.

Sorry if I'm not grasping all of this at once. I'm getting there.

BTW. I found out that I've been using the wrong symbol file when debugging the
core dumps. My DDK differs from the one you are using so we get different
binaries when building. I'll be using binaries build on my machine when
debugging my changes. Please include a symbol file in the next release if
possible. This could mean that some of my bug reports are completely wrong as
to where in cipdrvr.c the stack trace passes.

/Erik





<< | Thread Index | >> ]    [ << | Date Index | >> ]