[luau] locking server

R. Scott Belford sctinc at flex.com
Tue Nov 26 12:25:00 PST 2002


Thanks for the quick suggestions


On Tuesday, November 26, 2002, at 12:11 PM, MonMotha wrote:
> A great thing to do would be run memtest86 on the system, especially if 
> you have thigns randomly segfaulting.  Bad memory can be a tricky thing 
> to spot and diagnose.

I gave it about 4 days of this before coming online.  Should I have 
perhaps run it longer, maybe a month, or do you think that the test 
would uncover bad memory within a few days?

>
> Another thing that happened to someone recently was the motherboard not 
> setting the voltage correctlty with AUTO.  Forcing the voltage to that 
> in the spec sheets fixed his problems.

This will be investigated.  It seems like I couldn't get 30 days with 
bad voltage, but, perhaps this ultimately leads to suggestion 3, thermal 
shutdown.  I'll check.

>
> This definately sounds like a hardware issue (possibly thermal 
> shutdown?).  Normally the kernel manages to at least throw up an Oops 
> on hardware failure, but occasionally hard locks are the result.  If 
> you can find something that reliably triggers the problem, you can go a 
> great way to diagnosing the cause.  Another possibility if it is 
> software is a problem in an interrupt handler or some other situation 
> where the kernel can't be interrupted but control is never returned to 
> the kernel by a driver.

I have theorized that my realtek ethernet chipset may be substandard for 
this application.  A freebsd friend pointed out that the author of the 
realtek driver for Freebsd made a few very negative comments about the 
quality of the chipset in his man pages.  He makes these two comments:

      "Since outbound packets must be longword aligned, the transmit 
routine has
      to copy an unaligned packet into an mbuf cluster buffer before 
transmis-
      sion.  The driver abuses the fact that the cluster buffer pool is 
allo-
      cated at system startup time in a contiguous region starting at a 
page
      boundary.	Since cluster buffers are 2048 bytes, they are longword
      aligned by definition.  The driver probably should not be depending 
on
      this characteristic.

      The RealTek data sheets are of especially poor quality: the grammar 
and
      spelling are awful and there is a lot of information missing, 
particu-
      larly concerning the receiver operation.  One particularly 
important fact
      that the data sheets fail to mention relates to the way in which 
the chip
      fills in the receive buffer.  When an interrupt is posted to signal 
that
      a frame has been received, it is possible that another frame might 
be in
      the process of being copied into the receive buffer while the 
driver is
      busy handling the first one.  If the driver manages to finish 
processing
      the first frame before the chip is done DMAing the rest of the next
      frame, the driver may attempt to process the next frame in the 
buffer
      before the chip has had a chance to finish DMAing all of it."

The rl driver was written by Bill Paul <wpaul at ctr.columbia.edu>.

In your opinion, could this lead to a lock-down, and does realtek have 
that bad of a reputation in the Linux community?  It sounds pretty bad 
to me.

thanks again for your thoughts

scott




More information about the LUAU mailing list