<< | Thread Index | >> ]    [ << | Date Index | >> ]

Subject: Re: cipe-win32-pre9 fix
From: Damion Wilson <dwilson,AT,ibl,DOT,bm>
Date: Thu, 10 May 2001 22:48:02 +0200
In-reply-to: <31276.988416041@www32.gmx.net>


That doesn;t make sense to me either. I tell you what, I'll try 
judiciously adding the spinlocks to just the linked list handling. It
shouldn't speed up the code, though. What is your speedup ? Furthermore, my
setup has been stable for months (I'm running three instances) with ping
times (round trip) about .8 ms slower than "on the wire" speed.

The date does mean that the item is closed


On Thursday, May 10, 2001 at 14:36:29 [ADT] Erik Wallin wrote:
> Damion Wilson wrote:
> > No. Wait. Stop. The spinlock stuff causes problems in the list
> management.
> > I just finished yanking it all out because it was causing crashes due
> to
> > invalid irql. The documentation implies (Microsoft docs, go figure)
> that
> > the spinlocks should only REALLY be needed on SMP hardware. Please read
> the
> > TODO.TXT to find out what I think are the issues.
> Ok, I'll wait. What I don't understand is why this change makes such a
> huge
> improvement as it does? Before the spinlocks were added, I couldn't even
> start
> the service on some machines. Now it seems really stable. I haven't been
> able
> to get it to blue screen at all since. Maybe it's only because of the
> slowdown
> due to the spinlock code?
> BTW. What does the STATE mean in Todo.txt? A date means you consider the
> issue
> closed, and we don't have to worry about it?
> > Where, exactly, is NdisFreeMemory freeing memory with different flags
> than
> > it is being allocated with ? Right now, the memory being allocated for
> > PacketBuffers is different than that for list nodes.
> Packet buffers are allocated in AdapterTransmit with a direct call to
> NdisAllocateMemory(...,0,...).
> They are deallocated in DestroyTapDevice with a call to MemFree. MemFree
> uses
> They are also deallocated in CompleteIRP and AdapterTransmit using
> NdisFreeMemory(...,0).
> Unless the tap device is destroyed they are allocated and deallocated
> using the
> same flags. This might explain why the machine used to blow up in my face
> if I
> tried to disable the device, both with and without the service running.
> Come to
> think of it, this only happened if the service had been started or when
> any
> packets had been sent down the interface.
> I don't understand why we have to use NDIS_MEMORY_NONCACHED or
> NDIS_MEMORY_CONTIGUOUS at all. The driver is not reading or writing stuff
> that
> is going to be read by a separate chip, just different threads (possibly
> on
> different processors). That should be handled by cache coherence logic.
> > There are two queues: a packet queue and an IRP queue. The handling of
> all
> > of it should be fairly synchronous. There, of course, are no guarantees
> on
> > whether the code will be called from one thread or another, but in
> > IRQL_DISPATCH, there should be no preemption anyway1. Read up on it in
> the
> > DDK documentation (as crappy as it is).
> Ok, I'll read the documentation and see if I understand this in detail.
> Let me just first make sure I have grasped the design.
> The tap device is emulating an ethernet device, right? At what level? The
> defines physical adapters, filters etc. What would you call this?
> Is the ethernet device the TAP device? Or, is the TAP device the whole
> idea of
> "tapping" packets from a virtual adapter to a user mode process?
> What about the communication with the service? Is that simpy opening up a
> pipe
> of some sort? Where do I find information about the IO-calls involved?
> I'll try
> to understand this on my own, but if you have any pointers I would
> appreciate
> it.
> Once I understand it, I don't mind writing some kind of documentation if
> necessary so others can understand the code easier too.
> > I originally had an array for this stuff and moved to a linked list for
> > flexibility. A friend has suggested using vectors, and considering the
> way
> > the driver structure has unfolded, this would now be the best option.
> > However, you would not notice any performance gain unless you were
> using
> > gigabit ethernet and even then, the only way to get "real" performance
> gain
> > would be to fold the usermode tunnel handling into kernel mode thus
> > preventing the state changes (all the stuff that windows does to get
> > to/from usermode) and the general slowness of a usermode (preemptable)
> > process running at normal priority.
> Ok. I'm convinced. Let's drop the array idea.
> I guess a state change for every <queue length> packets is the minimum
> required
> at a saturated connection. So a longer queue would solve the
> usermode/kernel
> mode problem? Or is there a state change required for every buffer read?
> The
> more I think about it the more likely it seems. The user mode code makes
> an I/O
> read/write and that translates into a system call, especially with
> unbuffered
> I/O. I understand why you started out with shared memory. Too bad it
> didn't
> work.
> Sorry if I'm not grasping all of this at once. I'm getting there.
> BTW. I found out that I've been using the wrong symbol file when
> debugging the
> core dumps. My DDK differs from the one you are using so we get different
> binaries when building. I'll be using binaries build on my machine when
> debugging my changes. Please include a symbol file in the next release if
> possible. This could mean that some of my bug reports are completely
> wrong as
> to where in cipdrvr.c the stack trace passes.
> /Erik

<< | Thread Index | >> ]    [ << | Date Index | >> ]