Limit on number of scripts running simultaneously in a page

pwhodges · February 19, 2018, 12:29pm

This page shows around 500 images, each of which is generated using the same script with different parameters; you will note that after a while many of the images are missing:

https://cassland.org/images/Avatars/

The page is generated by a PHP script: https://sye.dk/sfpg/

I was not aware of this happening when the page was hosted on Apache. The author of the script prompted me to check the Caddy logs, with the following explanation:

Some servers have a limit that prevented the same script in running more
than x instances at the same time. And some also have a limit on how
much memory a script is allowed to consume, all simultaneously running
instances combined.

The way this gallery works, the one script (index.php) acts as both HTML
and all images including thumbs. So loading a page with 100 thumbs, will
make 101 requests for the same file “index.php”, just with different
parameters.

If you are on a hosted server, try ask them about these limits. If its
your own server, have a look in the web-server log, there will most
likely be an entry about this, when it happens.

And indeed the Caddy log is full of errors like these:

19/Feb/2018:12:11:11 +0000 [ERROR 502 /images/Avatars/index.php] dial tcp 127.0.0.1:9123: connectex: No connection could be made because the target machine actively refused it.
19/Feb/2018:12:11:11 +0000 [ERROR 502 /images/Avatars/index.php] dial tcp 127.0.0.1:9123: connectex: No connection could be made because the target machine actively refused it.
19/Feb/2018:12:11:11 +0000 [ERROR 502 /images/Avatars/index.php] dial tcp 127.0.0.1:9123: connectex: No connection could be made because the target machine actively refused it.
19/Feb/2018:12:11:11 +0000 [ERROR 502 /images/Avatars/index.php] dial tcp 127.0.0.1:9123: connectex: No connection could be made because the target machine actively refused it.
19/Feb/2018:12:11:11 +0000 [ERROR 502 /images/Avatars/index.php] dial tcp 127.0.0.1:9123: connectex: No connection could be made because the target machine actively refused it.
19/Feb/2018:12:11:11 +0000 [ERROR 502 /images/Avatars/index.php] dial tcp 127.0.0.1:9123: connectex: No connection could be made because the target machine actively refused it.

And for completeness, here is the section of the Caddyfile relating to that server:

cassland.org, 
www.cassland.org {
	root ..\cassland.org\html
	browse /Album
	browse /images
	browse /sounds
	browse /scores2
	browse /Varicam
	browse /TascamMod

# "startup" seems to hang on Windows, so I made php-cgi a service instead.
	fastcgi / 127.0.0.1:9123 php

# proxy random image generators to Apache which knows what to do with them:
	proxy /images/QCavi.jpg localhost:81 {
		transparent
	}
	proxy /images/PJavi.jpg localhost:81 {
		transparent
	}
	proxy /images/FireDog.jpg localhost:81 {
		transparent
	}

# password for Cherwell Singers pages:
	basicauth /CherwellSingers xxxxx yyyyy
	
# password for private scores page:
	basicauth /scores2 xxxxx yyyyy
	
# password for proxies:
	basicauth xxxxx yyyyy {
		realm "proxies"
		/PP1
		/PP2
	}

	log .\Logs\CLaccess.log
	errors .\Logs\CLerror.log {
		404 404.html
#		404 New404.html
	}
}

Is there some configuration in Caddy (or, I suppose, in PHP perhaps) which I can set to ease this problem? The hosting machine is lightly loaded and has oodles of resources, though not a particularly fast processor. Changing the memory_limit for PHP made no difference (it is over ten times the size of all the data used in that page). Although the Caddy error seems to indicate that it is PHP refusing to cooperate, the PHP error log has no entries relating to this.

Paul

magikstm · February 19, 2018, 8:29pm

Could you try removing the proxies and see if that makes a difference?

Sometimes the page doesn’t appear at all and I get:
502 Bad Gateway

pwhodges · February 19, 2018, 11:41pm

I can’t see how that could be relevant, and indeed commenting out the proxies has made no difference to me (I’ve left it like that for now so you can try it).

I’ve never seen the page not appearing, even when accessing it from outside my local network (this server is on an ADSL line with about 20Mbps outward speed), so I’m not sure why that’s happening for you.

The slight variations I get in which images appear suggest to me a timing sensitivity in reusing some resource that’s run out but is continually being freed. There’s nothing difficult going on here, so I suppose there is simply a process or thread limit being hit somewhere - I just can’t find where, though.

Whitestrake · February 20, 2018, 8:06am

Your PHP-CGI service is where the connections are being refused - check the configuration for concurrency limits.

magikstm · February 20, 2018, 10:11am

Try to add this line in your caddyfile to start php-cgi alongside caddy (instead of using a service):
on startup C:\path\to\php\php-cgi.exe -b 9123 &

Ref:

https://caddyserver.com/docs/on

Which version of Windows are you using?

pwhodges · February 20, 2018, 11:45am

I’m using Windows 2012 R2 Datacentre, running on HyperV Server 2012. The VM has four processors and 4GB of memory.

Using “startup” instead of the service made no change in the behaviour. I had to use “startup” rather than “on startup”, because the second generated the error message “Unexpected status SERVICE_PAUSED in response to START control.” I guess maybe I need to update my Caddy executable. As that made no difference, I’ve gone back to the service. The “on startup” command offends my sense of cleanness in the CaddyFile, because it is a setting which affects the behaviour of virtual servers other than the one it is placed with…

I’m trying to get fastcgi working in the Apache server to determine more clearly whether the issue is in the web server or PHP; I have no problem when using PHP as an Apache module, but that proves nothing.

Google has a few interesting hits on the matter of concurrency using PHP. Apart from the memory_limit setting in PHP (which has no effect on this), all the settings suggested are in the web server - I’ve found suggestions for IIS, nginx and Apache.

Whitestrake · February 20, 2018, 2:52pm

Means it’s likely FPM-specific, like a worker thread limitation or something. The Apache module runs PHP in its own process, I believe, so it would sidestep that entirely.

I’ve also seen references to VM networking flakiness (over localhost this seems unlikely), Firewall issues (likewise), and service crash/restarts, all seem like poor explanations for this behaviour. But with that error from Caddy and no related PHP error logs, it’s possible that localhost networking, for some reason, isn’t even letting the connection reach the FPM process.

pwhodges · February 21, 2018, 9:20am

I’ve still got nowhere with this; but I have learnt a few things. My attempt to get Apache to use PHP via fastcgi has failed so far - virtually no pages on Apache or PHP stuff even acknowledge that anyone runs them on Windows, and when they do, it becomes apparent that many writers translate Linux knowledge without any real checks that it’s appropriate.

So, all the stuff about php-fpm… What is it? FPM = Fastcgi Process Manager. But, in spite of the instructions you can find for installing it, php-fpm doesn’t even exist in Windows - its mechanisms are too hard to translate into the design of Windows, it seems. This implies that code written for Linux which presumes the management aspects of php-fpm are in place is inadequate in Windows - the process management must be done in the calling program, in this case the web server.

Somewhere I read that php-cgi itself does not multithread at all; does Caddy manage a queue of requests to fastcgi? How does Caddy itself respond internally to a soft error from it? I’m also wondering here whether a simple matter of retrying on the error would deal with this, and whether this should be in the server or the client - though obviously it would be better to find a way of controlling the processing so that the error doesn’t arise.

I’m tied up most of today, so it’ll probably be tomorrow before I do any more digging.

pwhodges · February 21, 2018, 9:46am

OK, I found a few minutes. Apache (running in the same VM as Caddy) is now using mod_fcgid to call php-cgi, and everything works perfectly. However, it’s still not really comparable, because it is using its own instantiation of php-cgi, not the service which Caddy is accessing (I will still try to get Apache to use that, via mod_proxy_fcgi - but that’s what I’m having real trouble with, because the setup is extraordinarily arcane even for Apache!).

Whitestrake · February 21, 2018, 9:54am

Wow, this was a bit of an eye opener for me. I can see how these assumptions that Linux knowledge translates is a problem!

As for how Caddy handles that, nope, no master queue or anything. Caddy (and specifically, Golang’s net/http) handles every incoming request in a goroutine, and all those goroutines where fastcgi sends a request upstream just fire them off when they get to it.

I mean, this isn’t a soft error - the connection to the specified port is being answered not with ACK but with RESET; Caddy is very specifically and deliberately being told to drop the connection request, it’s not like PHP is getting the connection and just timing out or something. Caddy’s go-to behaviour for handling problems upstream is just to issue the 502 to the original client and note the error it encountered in the errors log for the sysadmin to deal with, there’s very little it can do in terms of recovery.

And the images don’t go missing any more?

You’re not kidding about arcane, though. mod_php is the way to do things, probably in the vast majority of web servers world wide, even if only because it’s what cPanel does. And it just works, so why ever bother trying to figure something else out, right? Makes things like this difficult to sort out.

pwhodges · February 21, 2018, 10:19am

Yes, there are no missing images any more in the current Apache setup using mod_fcgid.

But as for 502 being a hard error, clearly php-cgi is saying “Nope, can’t deal with that”. But is that a problem of php-cgi’s configuration (hint: it doesn’t have any - everything that gets mentioned in the articles is for php-fpm), or the behaviour of the caller issuing a request when it shouldn’t? That’s not an accusation - I genuinely don’t know. But at this moment it seems that Apache (in the form of mod_fcgid) is managing to control something that Caddy doesn’t - it’s the same php-cgi being called after all, even though it’s a different instantiation.

EDIT - see next message for update on php-cgi parameters

pwhodges · February 21, 2018, 4:43pm

I found some parameters for php-cgi which I could change; PHP_FCGI_MAX_REQUESTS and PHP_FCGI_CHILDREN. These have to be set in the environment, which I was easily able to do using nssm (Non-Sucky Service Manager). Using the well-known PHP call to dump the configuration I can see that PHP is recognising these variables and seeing the values which I set.

The default values (at least the values discussed in the article where I discovered them) appear to be 500 and 8. So I tried extreme variations of them - 5000, 64; 1000, 1 and a couple in between, and observed no change in the behaviour of this page.

I am finding it hard to see that there is any way I can modify PHP’s behaviour to fix this, so I’m hoping that some bright spark will think of a way that Caddy could be made to help.

I note elsewhere that it is suggested the the CHILDREN parameter should be set to 0 when using mod_fcgid because that only ever passes one request at a time to the cgi process - in other words it is serialising all calls to fcgi processes such as php-cgi. Something like this is what I had in mind when I asked about Caddy’s queuing.

pwhodges · February 22, 2018, 12:33pm

Just to confirm - I have now updated from Caddy v0.10.6 to v0.10.11 with no change in this behaviour.

tobya · February 22, 2018, 1:11pm

I tried to reproduce this with a page that loads images via a php page rather than a static .jpg file with no problem. Calling the same file and same domain with 189 requests took a couple of seconds and no errors.

I’m not sure whats going on. I have a pretty robust php setup now which does require regular restarts (every 5 hours) but other than that is is pretty rock solid.

What can cause issue is if a php page is calling get_file_contents on a url that is on the same php-cgi instance as the origional call.

I run my php setup with 10 separeat running instances of php-cgi which works well.

tobya · February 22, 2018, 1:17pm

I think your issue may be running php as a service. Try this instead

mydomain.com {

  root c:\web\myroot

   //php 5.3
   on startup C:\pathto\php\php-cgi.exe -b  49975 &
   fastcgi / 127.0.0.1:49975 php

 }

You can do this for each domain and use a differnt port. The & is essential at the end of startup or caddy will wait until the command finishes.

You may need to set up a script to restart php and caddy . I posted one here.

pwhodges · February 22, 2018, 2:10pm

Thanks for trying to reproduce this. You tried 189 requests - I find that the problem starts with something a little over 200 images in the directory, so I would expect you not to see it in that test. In case of differences in processor speed (for instance) you probably need to try 300 to be sure of seeing it.

As I said in an earlier reply, I tried starting php-cgi from within Caddy, and it made no difference (though as I’ve since then updated Caddy, it now accepts the “on startup” format).

I tried again to check this, using a different port and instance for each site as you suggest, and it’s still the same…

Bear in mind that this is the only situation that PHP in Caddy (it’s fine in Apache, remember) has a problem with. I have other sites within the same Caddy instance running two PHP forums and a PHP CMS with no issues at all - up to now they have all been using the single service instance of php-cgi together happily.

Incidentally, using a service it was easy for me to change environment variables as I described above - this I cannot see how to do starting php-cgi within Caddy. However, as it didn’t help that is of little concern I suppose. However, a service offers a built-in mechanism for restarting the process each time it stops, so I still prefer it.

pwhodges · February 23, 2018, 3:44pm

Next experiments.

I tried setting PHP_FCGI_CHILDREN to 1000 (well over the number of images) and to 0 (= do not manage threads). Neither of these helped, though the first test after restarting the php-cgi service with children=0 seemed good - but subsequent calls reverted to the bad behaviour.

For FastCGI on Windows the PHP site says to use the “threadsafe” binaries with Apache and the “non-threadsafe” binaries with IIS. So I tried the nts binary as an alternative to the ts one I’d been using up till now. Guess what - no improvement there either.

Finally, I tried installing PHP7 (64-bit), replacing PHP5 (32-bit). Same symptoms.

Oh, and I tried changing “127.0.0.1” in the fastcgi command to “localhost”, which caused Caddy to default to using ::1 (IPv6 localhost) instead; no change.

I see no alternative at present to viewing this as a problem in Caddy’s implementation of Fast CGI under extremely high PHP traffic when running in Windows; the problem page (itself generated by PHP) contains several hundred calls to PHP (which return a single image each) in the one page.

Alan_Bradshaw · February 23, 2018, 6:44pm

I have a similar problem but unfortunately I have no solution for you, just wanted to add a note that you’re not alone. My project is a low priority and I’d actually forgotten about it for 6 months until this post reminded me to look at it!

I have a very low-spec caddy server which simply serves images requested through various parameters processed by PHP-FPM (this is Linux though, not Windows). As a test, I created a script on another server to request a few hundred images on one page.
This seems very similar to your scenario — each image request is processed by another instance of the same script. Mine stops responding after about 150 requests. I spent a few hours trying to troubleshoot this at the time and now cannot remember exactly where I got to except I was fairly sure the problem was somewhere in PHP config.

pwhodges · February 26, 2018, 8:03pm

In a spirit of desperation I tried running several instances of php-cgi in parallel, listing them using the CaddyFile “upstream” parameter. It made it worse.

I have also gone through the php.ini file examining every single possible entry and trying adding or changing anything that seemed as if it could conceivably bear on the situation. Nothing helped.

During the past week’s concentration on Caddy, I have found a solution for every reason I though I had for continuing to use Apache for some purposes. It would be a shame to discover that there is now another thing which Caddy can’t do well enough. Yes, I know that simply running it on Linux instead of Windows would deal with it; it’s not even as if I don’t have a Linux server running alongside my Windows one. But It would not suit how I want to keep things organised, and running away from a problem will never find the solution. After all, Caddy is supplied as working on Windows…

pwhodges · February 27, 2018, 10:21am

I realised I hadn’t enabled opcache in PHP. Adding this has resolved the problem in the case of the directory with very many small files which I was specifically concerned with. This is presumably because the many instances of the PHP script are now managing to complete as fast as they are generated.

However, it’s not an actual fix, as a directory with larger files still shows the problem, because of the longer processing times required: https//cassland.org/images/ImageTest.