Ticket #232 (new defect)

Opened 2 years ago

Last modified 7 months ago

httpd threads stuck, all doing EACCELERATOR_LOCK_RW for ever

Reported by: Tarkbark Assigned to: somebody
Priority: critical Milestone:
Component: eAccelerator Version: 0.9.4
Keywords: Cc:

Description (Last modified by bart)

Hi,

We are running apache 2.0.59, PHP 4.4.4 and eaccelerator 0.9.4. A few times a month apache ends up with all threads running the same portion of code over and over again. If more info is needed, send an email to hakan@esportnetwork.com.

Here is the loop that is running over and over again:

(gdb)
117                     EACCELERATOR_LOCK_RW ();
(gdb)
118                     p = &eaccelerator_mm_instance->locks;
(gdb)
119                     while ((*p) != NULL) {
(gdb)
120                             if (strcmp ((*p)->key, x->key) == 0) {
(gdb)
131                             p = &(*p)->next;
(gdb)
120                             if (strcmp ((*p)->key, x->key) == 0) {
(gdb)
124                                     if (x->pid == (*p)->pid) {
(gdb)
133                     if ((*p) == NULL) {
(gdb)
137                     EACCELERATOR_UNLOCK_RW ();
(gdb)
138                     if (ok) {
(gdb)
149                             t.tv_sec = 0;
(gdb)
151                             select (0, NULL, NULL, NULL, &t);
(gdb)
150                             t.tv_usec = 100;
(gdb)
151                             select (0, NULL, NULL, NULL, &t);
(gdb)
117                     EACCELERATOR_LOCK_RW ();
(gdb)
118                     p = &eaccelerator_mm_instance->locks;
(gdb)
119                     while ((*p) != NULL) {
(gdb)
120                             if (strcmp ((*p)->key, x->key) == 0) {
(gdb)
131                             p = &(*p)->next;
(gdb)
120                             if (strcmp ((*p)->key, x->key) == 0) {
(gdb)
124                                     if (x->pid == (*p)->pid) {
(gdb)
133                     if ((*p) == NULL) {
(gdb)
137                     EACCELERATOR_UNLOCK_RW ();
(gdb)
138                     if (ok) {
(gdb)
149                             t.tv_sec = 0;
(gdb)
151                             select (0, NULL, NULL, NULL, &t);
(gdb)
150                             t.tv_usec = 100;
(gdb)
151                             select (0, NULL, NULL, NULL, &t);
(gdb)
117                     EACCELERATOR_LOCK_RW ();
(gdb)
118                     p = &eaccelerator_mm_instance->locks;
(gdb)
119                     while ((*p) != NULL) {
(gdb)
120                             if (strcmp ((*p)->key, x->key) == 0) {
(gdb)
131                             p = &(*p)->next;
(gdb)
120                             if (strcmp ((*p)->key, x->key) == 0) {

Attachments

clip5.html (27.0 kB) - added by cisco150 on 03/15/07 21:07:00.
eaccelerator-lockbug.patch (1.1 kB) - added by terrysduncan on 12/11/07 22:26:11.
Shared memory locking patch

Change History

01/12/07 11:35:21 changed by bart

  • description changed.

01/12/07 15:00:24 changed by Tarkbark

Hi, i was asked to attach some info from config.h

#define MM_SEM_SPINLOCK 1 #define MM_SHM_IPC 1

03/15/07 21:07:00 changed by cisco150

  • attachment clip5.html added.

06/23/07 18:52:18 changed by JonathanO

  • priority changed from major to critical.

I suspect that this is probably a dup of #224. It's causing us serious problems, so I'm upping the priority.

08/13/07 15:29:07 changed by growler

Please try the debug patch from #224

12/07/07 03:41:59 changed by terrysduncan

I am seeing something very similar occasionally using lighttpd 1.4.11, EA 0.9.5 and PHP 5.2.2 on an ARM processor. The php processes go nuts. I have debugged it a bit and discovered that the free list in the shared memory segment is corrupt - it shows one node on the free list and it points back to itself which causes the php processes to spin.

I have also noticed that the semaphore value is hosed. Using an IPC semaphore, the value should always be 1 unless some process has it locked. I am seeing an increasing value there. So, it follows that if the locking mechanism is not working, it leaves open the possiblity of corrupting the free list.

I have a lot of testing before I can declare victory but I believe the problem is that there is a call to mm_unlock() in eacclerator_clean_request() without a matching mm_lock(). Can anyone tell me why that call is there? Is it going to hose something up if I remove it? terry dot s dot duncan at intel dot com

12/11/07 22:18:20 changed by terrysduncan

Here is a patch that seems to address this problem... I have seen no side effects for removing the unlock call and I have not seen the spinning PHP issue since implementing it. The mm.c change below is not necessary but would prevent other potential unmatched lock / unlock calls from causing problems.

--- eaccelerator.c.orig 2007-05-16 12:07:31.000000000 -0700 +++ eaccelerator.c 2007-12-10 13:41:23.000000000 -0800 @@ -1752,7 +1752,6 @@

mm_used_entry *p = (mm_used_entry*)EAG(used_entries); if (eaccelerator_mm_instance != NULL) {

EACCELERATOR_UNPROTECT();

- mm_unlock(eaccelerator_mm_instance->mm);

if (p != NULL eaccelerator_mm_instance->locks != NULL) {

EACCELERATOR_LOCK_RW(); while (p != NULL) {

--- mm.c.orig 2006-10-11 05:45:52.000000000 -0700 +++ mm.c 2007-12-07 16:14:44.000000000 -0800 @@ -357,10 +357,18 @@

return 1;

}

+static int locked = 0; +

static int mm_do_lock(mm_mutex* lock, int kind) {

int rc; struct sembuf op;

+ if (locked) + { + ea_debug_log("eAccelerator: attempted double lock: %u\n", getpid()); + return 1; + } + locked++;

op.sem_num = 0; op.sem_op = -1; op.sem_flg = SEM_UNDO;

@@ -374,6 +382,12 @@

int rc; struct sembuf op;

+ if (!locked) + { + ea_debug_log("eAccelerator: attempted double unlock: %u\n", getpid()); + return 1; + } + locked--;

op.sem_num = 0; op.sem_op = 1; op.sem_flg = SEM_UNDO;

12/11/07 22:26:11 changed by terrysduncan

  • attachment eaccelerator-lockbug.patch added.

Shared memory locking patch