[luau] locking server
R. Scott Belford
sctinc at flex.com
Tue Nov 26 12:25:00 PST 2002
Thanks for the quick suggestions
On Tuesday, November 26, 2002, at 12:11 PM, MonMotha wrote:
> A great thing to do would be run memtest86 on the system, especially if
> you have thigns randomly segfaulting. Bad memory can be a tricky thing
> to spot and diagnose.
I gave it about 4 days of this before coming online. Should I have
perhaps run it longer, maybe a month, or do you think that the test
would uncover bad memory within a few days?
>
> Another thing that happened to someone recently was the motherboard not
> setting the voltage correctlty with AUTO. Forcing the voltage to that
> in the spec sheets fixed his problems.
This will be investigated. It seems like I couldn't get 30 days with
bad voltage, but, perhaps this ultimately leads to suggestion 3, thermal
shutdown. I'll check.
>
> This definately sounds like a hardware issue (possibly thermal
> shutdown?). Normally the kernel manages to at least throw up an Oops
> on hardware failure, but occasionally hard locks are the result. If
> you can find something that reliably triggers the problem, you can go a
> great way to diagnosing the cause. Another possibility if it is
> software is a problem in an interrupt handler or some other situation
> where the kernel can't be interrupted but control is never returned to
> the kernel by a driver.
I have theorized that my realtek ethernet chipset may be substandard for
this application. A freebsd friend pointed out that the author of the
realtek driver for Freebsd made a few very negative comments about the
quality of the chipset in his man pages. He makes these two comments:
"Since outbound packets must be longword aligned, the transmit
routine has
to copy an unaligned packet into an mbuf cluster buffer before
transmis-
sion. The driver abuses the fact that the cluster buffer pool is
allo-
cated at system startup time in a contiguous region starting at a
page
boundary. Since cluster buffers are 2048 bytes, they are longword
aligned by definition. The driver probably should not be depending
on
this characteristic.
The RealTek data sheets are of especially poor quality: the grammar
and
spelling are awful and there is a lot of information missing,
particu-
larly concerning the receiver operation. One particularly
important fact
that the data sheets fail to mention relates to the way in which
the chip
fills in the receive buffer. When an interrupt is posted to signal
that
a frame has been received, it is possible that another frame might
be in
the process of being copied into the receive buffer while the
driver is
busy handling the first one. If the driver manages to finish
processing
the first frame before the chip is done DMAing the rest of the next
frame, the driver may attempt to process the next frame in the
buffer
before the chip has had a chance to finish DMAing all of it."
The rl driver was written by Bill Paul <wpaul at ctr.columbia.edu>.
In your opinion, could this lead to a lock-down, and does realtek have
that bad of a reputation in the Linux community? It sounds pretty bad
to me.
thanks again for your thoughts
scott
More information about the LUAU
mailing list