[luau] locking server

MonMotha monmotha at indy.rr.com
Tue Nov 26 12:05:00 PST 2002


R. Scott Belford wrote:
> Warren mentioned an issue familiar to me when he brought up some 
> troubles that he was having with a server locking down.  He and Ray have 
> figured it to be a hardware problem.  I have experienced something 
> familiar, and I am wondering what conditions can lead to a server lock 
> up with no hints in the logs and why it definitely is a hardware problem.
> 
> In my scenario, I have had a server lock up after about 5 days of hard 
> use.  It has happened twice.  Both times I was using Redhat's 7.2 
> Enterprise kernel (2.4.9-34enterprise).  I blamed it on a default kernel 
> setting that I did not understand.  I changed to the stock 2.4.9-34smp 
> kernel with Rhat 7.2.  After about 30 days, the same lockup.  By lockup 
> I mean that both remote and local terminal sessions are frozen.  
> Pressing ctrl + alt + del will not reboot.  My only hint is a series of 
> "failed to set personality on (some pid #)" on the screen.  An ugly 
> power down is the only "fix."
> 
> Upon reboot, there are no hints in the logs.  This is to say, there are 
> no hints in the var/log directory.  Perhaps I could look somewhere 
> else.  As far as the logs and server are concerned, everything is just 
> hunky-dorry.  Here is what I wonder:
> 
> What can cause this?  Is the machine that is locking up on Warren and 
> Ray staying up for as many days as mine?  Can hardware problems take 30 
> days to manifest themselves?
> 
> I have been told that /proc/sys/fs/file-max must be set high enough to 
> handle one's active files.  If this number is reached, does  a server 
> lock?  Is there a way to check how many files are open?
> 
> Is there another software or kernel setting that can lead to a lock 
> down, say, max inodes or something?
> 
> If you have any suggestions or insights or experiences that you can 
> share, I would be most gracious.
> 
> scott

A great thing to do would be run memtest86 on the system, especially if 
you have thigns randomly segfaulting.  Bad memory can be a tricky thing 
to spot and diagnose.

Another thing that happened to someone recently was the motherboard not 
setting the voltage correctlty with AUTO.  Forcing the voltage to that 
in the spec sheets fixed his problems.

This definately sounds like a hardware issue (possibly thermal 
shutdown?).  Normally the kernel manages to at least throw up an Oops on 
hardware failure, but occasionally hard locks are the result.  If you 
can find something that reliably triggers the problem, you can go a 
great way to diagnosing the cause.  Another possibility if it is 
software is a problem in an interrupt handler or some other situation 
where the kernel can't be interrupted but control is never returned to 
the kernel by a driver.

--MonMotha

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 252 bytes
Desc: not available
URL: <http://lists.freesoftwarehawaii.org/pipermail/luau-freesoftwarehawaii.org/attachments/20021126/e4913335/attachment-0001.pgp>


More information about the LUAU mailing list