Ticket #232 (closed defect: fixed)

Opened 4 years ago

Last modified 3 months ago

httpd threads stuck, all doing EACCELERATOR_LOCK_RW for ever

Reported by: Tarkbark Owned by: somebody
Priority: critical Milestone:
Component: eAccelerator Version: 0.9.4
Keywords: Cc:

Description (last modified by bart) (diff)

Hi,

We are running apache 2.0.59, PHP 4.4.4 and eaccelerator 0.9.4. A few times a month apache ends up with all threads running the same portion of code over and over again. If more info is needed, send an email to hakan@….

Here is the loop that is running over and over again:

(gdb)
117                     EACCELERATOR_LOCK_RW ();
(gdb)
118                     p = &eaccelerator_mm_instance->locks;
(gdb)
119                     while ((*p) != NULL) {
(gdb)
120                             if (strcmp ((*p)->key, x->key) == 0) {
(gdb)
131                             p = &(*p)->next;
(gdb)
120                             if (strcmp ((*p)->key, x->key) == 0) {
(gdb)
124                                     if (x->pid == (*p)->pid) {
(gdb)
133                     if ((*p) == NULL) {
(gdb)
137                     EACCELERATOR_UNLOCK_RW ();
(gdb)
138                     if (ok) {
(gdb)
149                             t.tv_sec = 0;
(gdb)
151                             select (0, NULL, NULL, NULL, &t);
(gdb)
150                             t.tv_usec = 100;
(gdb)
151                             select (0, NULL, NULL, NULL, &t);
(gdb)
117                     EACCELERATOR_LOCK_RW ();
(gdb)
118                     p = &eaccelerator_mm_instance->locks;
(gdb)
119                     while ((*p) != NULL) {
(gdb)
120                             if (strcmp ((*p)->key, x->key) == 0) {
(gdb)
131                             p = &(*p)->next;
(gdb)
120                             if (strcmp ((*p)->key, x->key) == 0) {
(gdb)
124                                     if (x->pid == (*p)->pid) {
(gdb)
133                     if ((*p) == NULL) {
(gdb)
137                     EACCELERATOR_UNLOCK_RW ();
(gdb)
138                     if (ok) {
(gdb)
149                             t.tv_sec = 0;
(gdb)
151                             select (0, NULL, NULL, NULL, &t);
(gdb)
150                             t.tv_usec = 100;
(gdb)
151                             select (0, NULL, NULL, NULL, &t);
(gdb)
117                     EACCELERATOR_LOCK_RW ();
(gdb)
118                     p = &eaccelerator_mm_instance->locks;
(gdb)
119                     while ((*p) != NULL) {
(gdb)
120                             if (strcmp ((*p)->key, x->key) == 0) {
(gdb)
131                             p = &(*p)->next;
(gdb)
120                             if (strcmp ((*p)->key, x->key) == 0) {

Attachments

clip5.html Download (27.0 KB) - added by cisco150 3 years ago.
eaccelerator-lockbug.patch Download (1.1 KB) - added by terrysduncan 3 years ago.
Shared memory locking patch

Change History

comment:1 Changed 4 years ago by bart

  • Description modified (diff)

comment:2 Changed 4 years ago by Tarkbark

Hi, i was asked to attach some info from config.h

#define MM_SEM_SPINLOCK 1 #define MM_SHM_IPC 1

Changed 3 years ago by cisco150

comment:3 Changed 3 years ago by JonathanO

  • Priority changed from major to critical

I suspect that this is probably a dup of #224. It's causing us serious problems, so I'm upping the priority.

comment:4 Changed 3 years ago by growler

Please try the debug patch from #224

comment:5 Changed 3 years ago by terrysduncan

I am seeing something very similar occasionally using lighttpd 1.4.11, EA 0.9.5 and PHP 5.2.2 on an ARM processor. The php processes go nuts. I have debugged it a bit and discovered that the free list in the shared memory segment is corrupt - it shows one node on the free list and it points back to itself which causes the php processes to spin.

I have also noticed that the semaphore value is hosed. Using an IPC semaphore, the value should always be 1 unless some process has it locked. I am seeing an increasing value there. So, it follows that if the locking mechanism is not working, it leaves open the possiblity of corrupting the free list.

I have a lot of testing before I can declare victory but I believe the problem is that there is a call to mm_unlock() in eacclerator_clean_request() without a matching mm_lock(). Can anyone tell me why that call is there? Is it going to hose something up if I remove it? terry dot s dot duncan at intel dot com

comment:6 Changed 3 years ago by terrysduncan

Here is a patch that seems to address this problem... I have seen no side effects for removing the unlock call and I have not seen the spinning PHP issue since implementing it. The mm.c change below is not necessary but would prevent other potential unmatched lock / unlock calls from causing problems.

--- eaccelerator.c.orig 2007-05-16 12:07:31.000000000 -0700 +++ eaccelerator.c 2007-12-10 13:41:23.000000000 -0800 @@ -1752,7 +1752,6 @@

mm_used_entry *p = (mm_used_entry*)EAG(used_entries); if (eaccelerator_mm_instance != NULL) {

EACCELERATOR_UNPROTECT();

  • mm_unlock(eaccelerator_mm_instance->mm);
if (p != NULL
eaccelerator_mm_instance->locks != NULL) {

EACCELERATOR_LOCK_RW(); while (p != NULL) {

--- mm.c.orig 2006-10-11 05:45:52.000000000 -0700 +++ mm.c 2007-12-07 16:14:44.000000000 -0800 @@ -357,10 +357,18 @@

return 1;

}

+static int locked = 0; +

static int mm_do_lock(mm_mutex* lock, int kind) {

int rc; struct sembuf op;

+ if (locked) + { + ea_debug_log("eAccelerator: attempted double lock: %u\n", getpid()); + return 1; + } + locked++;

op.sem_num = 0; op.sem_op = -1; op.sem_flg = SEM_UNDO;

@@ -374,6 +382,12 @@

int rc; struct sembuf op;

+ if (!locked) + { + ea_debug_log("eAccelerator: attempted double unlock: %u\n", getpid()); + return 1; + } + locked--;

op.sem_num = 0; op.sem_op = 1; op.sem_flg = SEM_UNDO;

Changed 3 years ago by terrysduncan

Shared memory locking patch

comment:7 Changed 10 months ago by hans

  • Status changed from new to closed
  • Resolution set to fixed

Bart already seems to have fixed this in rev @342

Thanks for your input on this!

comment:8 Changed 6 months ago by sim

 decoration Changed 1 year ago by admin

 bathtub Changed 1 year ago by admin

 solar system Changed 1 year ago by admin

 stair parts Changed 1 year ago by admin

 solar supply Changed 1 year ago by admin

comment:10 Changed 3 months ago by whome

It appears we have a decent cross-section of people here.  dresses for prom |  hobo purses

comment:11 Changed 3 months ago by bobmarks

and its good to see this ticket resolved as well, well done daily sudoku puzzles  daily sudoku

Note: See TracTickets for help on using tickets.