官术网_书友最值得收藏!

Request handling

Instead of jumping straight into the alphabet soup of HTTP, CGI (Common Gateway Interface), WSGI (Web Server Gateway Interface), and so on, we will examine the entire problem of request handling from where all it started. The basic design goal of the Web was sharing information in the form of documents. So, all in all, it was a document-sharing system, where each document had a unique URL like a unique path for each file (ignoring links and shortcuts for the sake of discussion) on a file system. Each document could be linked to other documents. This was the simple HTTP Web.

The initial Web was simple and consisted of two pair of programs. One piece of program, which was called the client (nowadays, it is mostly in the form of a modern desktop or a mobile browser), would request a document by opening a socket for a given server and on a specific port using a very specific request format like this as textual data:

Host: www.mit.edu
GET /publications/quantum/computing/future.html

The preceding text would be sent to www.mit.edu (actually, to whatever IP this address corresponds to after the DNS resolution, which is yet another detailed area and not within the scope of our discussion), where a program called server would be listening on port 80. The program that would parse this request would see that /publications/quantum/computing/future.html is the document that it needs to return from its directory of documents (which of course would be specified somewhere in the configuration), and the connection would be closed. This is exactly the thing that happens when you visit any dummy URL:http://www.mit.edu/publications/quantum/computing/future.html in your browser.

The text that the client program sent is actually an HTTP protocol, and the server program that we talked about is called HTTP server because it understands and responds to the HTTP protocol. It is called HTTP server because there are many other types of servers such as FTP servers, SMTP servers, DNS servers, and so on. An HTTP server traditionally listens on port 80, and it was probably the 1980s when all of this started. Actually 1989 is the year when the idea of linked documents and HTTP protocol also known as World Wide Web was proposed. Probably that is why they used port 80 because in the decade of 80s, just a wild guess. Eventually, and most probably, this tradition continued and is still with us today; this is just a guess.

The CGI program

The preceding system, although very simple, elegant, and yet powerful, has one drawback. If you request the same document, you will always get the same content, unless you modify the document yourself on the server using some text editor. To get different content, you have to request a different document altogether. You will never come across a case where you request the same document but get different results.

This kind of requirement might sound absurd, but it is at the heart of the modern Web. That's how the same HTML page lists a different set of emails depending on who logs in and what they opt for. So, this leads us to dynamic content generation. In other words, you can generate pages on the fly, which don't yet exist on the disk, except maybe in the form of a template, where the actual content is yet to be filled in on the fly. From compilers and assemblers to almost any other imaginable program, it is always about input, processing, and producing output, and that's exactly what we need here. So eventually, as a part of the evolution, the HTTP servers were configured to execute a program and return its output as a response to the requester instead of a static document. This is how the wave of dynamic web was born. The question is, how does one pass information about an HTTP request to a program and collect the output? There's a convention that emerged for this that we will examine, and it is called CGI.

Streams and environment variables

Before we discuss how CGI works, there are few ingredients that are worth examining for ease of understanding. An important thing that emerged from the UNIX world and eventually has support in virtually all operating systems is the environment variables. So, the idea is that the operating system itself or any program can create variables that are managed by the operating system, but anyone on the system can read from or write these variables.

The other important concept is that of I/O streams. In POSIX-compliant operating systems (which borrowed this convention yet again from the UNIX world).

A program has three special files available at all times to read and write. One file is used to read the input. So, the input from the keyboard is made available via this file, which is called standard input or stdin. A smart trick that can be played by some other program is to write its output on the input stream (stdin) of another program, which in turn would see as if its input is coming from the keyboard. The main concept here is that the input stream is what is read by a program and which might be populated by another program or input devices.

The other two files, which are as follows, are used to write the output of the program:

  • The first one is called standard output or stdout for short, and that's where all your print statements in Python and Ruby leave their mark, and that's where printf and cout will write their output in a C/C++ program
  • The second output file is called standard error or stderr and is used to writing error messages for anything that is not an output of the program but output about the output or whatever happens during that output's generation

Now, back to the CGI program. The plain HTTP text, as shown in the preceding example, is received by the HTTP server. Then, the HTTP server will execute a program. The headers (represented by colons separated by key value pairs) and query string parameters (such as http://www.example.com?sample_query_param=1) will be placed in the environment variables before the program is invoked. A request body (usually with a HTTP POST) is made available via the standard input or (stdin). Now, when the program is executed, it will read the environment variables (that contains the HTTP headers) and stdin (the request body, if any exists) and write its output on stdout, which will be collected by the server once the program is done. This program output finally will be sent back to the client, which is usually a web browser.

CGI and Google App Engine

The Python 2.5 runtime environment, which is now deprecated, uses CGI as its request handling mechanism. The request would be received by the frontend Google servers, placed in a queue, as we learned in the previous chapter, and then finally handed over to an instance where a Python interpreter process reads in all the environment variables and stdin of the process will be set from the incoming request as well. The output of the program on stdout will be collected and sent to the client. The currently available mainstream and Python 2.7 runtime environment, the CGI is still available. When you indicate a script name with .py (or a compiled Python byte code file with .pyc) extension, the request is handed over to it in the CGI fashion.

The program that we wrote in the first chapter, though targeting the Python 2.7 runtime environment, was a CGI program. Now that you have a better understanding of CGI, it's time to implement it. To save you a lot of trouble, copy the directory from the first chapter, rename it to cwsgi, and make the following changes:

  • Rename main.py to cgimain.py
  • Edit app.yaml and replace main.py with cgimain.py
  • Change url: /.* to url: /cgi

Now, start the app with the following command:

python ~/sdks/google_appengine/dev_appserver.py ~/Projects/mgae/ch02/cwsgi/ 

After doing this, open your browser and visit http://localhost:8080/cgi. You'll see that all the environment variables are listed just as in the original version of the program from the previous chapter.

WSGI

Nothing makes sense until you really understand the rationale behind it and therefore, yet again, instead of jumping straight into what WSGI is and how it fits into the overall picture, we will examine the problems with CGI first.

Problems with CGI

The CGI approach is surely a big step forward, but it has a very serious drawback. It creates a process for every incoming request, and creating a process is expensive in terms of computing time as it involves the allocation of memory and initialization of many internal data structures. More often than not, the actual computational time that is required to render a page is barely a fraction (sometimes one-fifth to one-tenth) of the process creation time. This means that about 80 percent of the request time, as perceived by the end user, is actually spent in spawning a new process. This mechanism is repeated over and over for every single incoming request.

For a concrete example, assume that creating a process takes 1.5 seconds, which equals 1500 milliseconds. However, responding to an actual request barely takes 0.3 seconds, or 300 milliseconds. This means that the time taken to create a process alone is five times the actual time that is required to serve a request. The mechanism of request handling actually has a name. It is called a process-per-request model because a new process is spawned for every incoming request.

Besides the extra lag experienced by the end user, the main issue is that of sheer inefficient resource utilization. While framing the scalability problem in the previous chapter, we observed that the number of requests processed per second is an important metric and this process-per-request model is quite inefficient because of the overheads involved as we just examined.

Solutions

Many solutions were devised to eliminate this process creation overhead, but all of them are actually based on either of the two approaches. The first approach revolves around a simple observation that the creation of a process creates a bottleneck. Hence, instead of creating a process for every request, it will be created only once. For every incoming request, a separate thread should be created instead of a whole new process. This is called thread-per-request model. As thread creation is lightweight in comparison to process creation, this yields much more throughput, but on the downside, it has a side effect as well. This is the case with every multithreaded program, and we will examine it in a while. WSGI from Python, Rack from Ruby, and Servlet Specifications from the Java world are all examples of this thread-per-request model.

A slight improvement for the above solution that sprung up over time is that instead of creating a thread for every incoming request, a fixed (configurable) number of threads are created at the time of creating the process. Now, whenever a new request comes in, it is assigned a thread from this collection (called the thread pool). Once the request is served, the thread sits idle again in the set of threads that were created during the startup of the process. This is called thread pooling.

The other model is a more recent one and relies on the recent innovation in Linux kernels. Instead of listening on and blocking a port for a request, kernel invokes your program whenever there is some activity on the port that you specify. All such solutions are built on top of libevent, a C library that sits on top of the functionality provided by the Linux kernel. Further discussion and description of this model is beyond the scope of this book. An example of this approach includes node.js and gevent in Python, but this is also beyond this book's scope.

For those of you who are interested in the above mentioned request handling techniques shall check out the following links:

What WSGI looks like

WSGI is a Python standard (PEP 3333) that defines how Python programs are supposed to interact with web servers. The specifications are quite detailed, and they are as dry as the committee could make them. So, let's not look at them. Instead, let's try to dismantle the concept in the simplest possible terms. The idea of Web Server Gateway Interface (WSGI) is very simple. You define a function that accepts a number of arguments of certain types, and the function returns a specific type of data to you.

The function and arguments can have any name whatever you want to, but the type, the number of arguments passed, and the values that are returned back must adhere to the standard. The arguments are passed to the function from the web server and the return value of the function is received by the web server and passed back to the client. The web server is supposed to call this function whenever it receives a request. All request headers and the other data will be passed to this function as arguments.

This is how a WSGI script with the function that we talked about look like:

def application(environ, start_response):
    start_response('200 OK', [('Content-Type', 'text/plain')])
    return "Hello from WSGI!"

Now, let's take a detailed look at the arguments that are supplied to the function from the web server. The first argument is a Python dictionary. All the HTTP request headers are placed in this dictionary as key-value pairs. All other environment variables are also included in the same dictionary. The second argument is a function that the application must call to indicate that it has started generating a response. This function must be called with the help of two arguments. The first argument is a string, and the other is a list. The string is returned to the web server as the HTTP status code (for instance, HTTP 404, 301, 302, or the famous HTTP 500). The second argument which is a list comprises tuples, where each tuple represents a response header name and its value. So, in the preceding example, we returned HTTP 200 (because of the first argument to start_response). Content-Type is set to text/plain because of the second list that is being passed to start_response. If more response headers were sent, they would end up in this list as two member tuples in the form of key value pairs.

The return value of the function must be an iterator. A list or a tuple where each element is a response line or you might use the yield keyword thus forming a Python generator and the response to be returned can be generated such as fetched from a database, a large file, or some other network resource. You can also return just a string because a string can be iterated as well. However, it will be inefficient, as we shall explain shortly. In the preceding example, we returned a string so that the server software that would receive this would iterate over this string, which means iterating character by character because it is a string, which is bit inefficient. Therefore, we are better off yielding individual lines or a list of response lines and return it, which in turn will be sent back to the client. So, that's how the last line is supposed to be:

return ["Hello from better and proper way of returning WSGI response"]

That's all about WSGI that you need to know. Now that the whole of your web application is merely just a function, it can be executed on separate threads. Hence, new threads can be created when new requests come along, which is far more efficient than spawning a whole new process on every incoming request. Remember that thread pooling can dramatically eliminate the time required for thread creation as well which will be even more efficient.

WSGI – Multithreading considerations

Now, there's one caveat that we should discuss about WSGI model. In the CGI paradigm, a new process is created for each incoming request, which means that global variables are totally isolated from any other Python process that is created at the same time (or an overlapping time slot) to serve another request. Hence, any code that is writing and reading too and from the global variables is totally constrained to be process spawning request alone.

However, in the case of a WSGI model, which goes for a one-thread-per-request paradigm, local variables are of course within the scope of the function, but the global variables are visible to two incoming requests, that are being executed on the same or an overlapping time frame. Now, when one request is writing to the global variables and another is reading them at the same time, it might so happen that the variables are in an inconsistent state (for instance, the whole value is not yet written while it is being accessed and read by another request as it is), which might result in a strange, unpredictable, and very hard-to-debug behavior that shows up only under very specific conditions.

In other words, CGI versus WSGI exhibits the same problems as that of the Process versus Threads paradigms of multiprocessing. When you perform multiprocessing using processes as the atomic blocks of concurrency, each process has its own separate state, which of course includes the global variables only visible to the process itself. Thus, this reduces complexity. However, this is inefficient because process creation is a heavyweight process. On the other hand, multiprocessing with threads is much lightweight. But multi-threading brings up the problem of conflicting state due to shared global data structures.

To cater to this, all you can do is to ensure that you don't read and write to the global variables without some thread synchronization mechanism, such as locks or may be don't write at all.

WSGI in Google App Engine

WSGI was not supported in the now deprecated Python 2.5 environment, but with Python 2.7, it is the preferred way although CGI apps are possible as well. However, some of the Python 2.7 features might not work if your application is not a WSGI app. To indicate that you are using WSGI instead of CGI, you simply have to indicate your script name and the name of the WSGI application function in app.yaml instead of the script name alone.

So, instead of this code:

- url: /about
  script: main.py

We will follow this code:

- url: /about
  script: main.application

This way, the Google App Engine runtime environment will treat your application as a WSGI app instead of a CGI app. We already discussed about the consequences of the thread-per-request model in the previous paragraphs. In light of the same, you have to indicate to Google App Engine whether your application is thread-safe or not. That is, are you reading and writing to the global variables, data structures or lists? If you are, you should set the thread safety to false in app.yaml. With multi-threading safety flag turned off only one request will be handed over to your application instance at a time. Once you have returned a response to current request, only then will the next request be handed over to you. This way, your app will be stable, but the number of requests that are being processed in a given time frame will be reduced because only one request would be processed at a time. If you set threadsafe to true, multiple requests will be handed over concurrently to your application, which will increase the throughput. However, it is your responsibility to ensure that you are not reading or writing to the global variables, as discussed in the previous section.

If you only need to pick two things out of this whole discussion, the first one is to always use WSGI. The second is to ensure that your applications are thread-safe. And if they are not set the threadsafe flag to false.

Now, it's time to put all the theory into practice. Let's write a WSGI program that runs on Google App Engine. We will extend the program from the previous section to add the WSGI handler into it. To accomplish this, perform the following steps:

  1. Copy the cgi application and create a new file named wsgimain.py. Enter the following code into it:
    def application(environ, start_response): 
        response_body = "" 
        for key in environ: 
          response_body += "%s: %s\n" % (key, environ[key]) 
        status = "200 OK" 
        response_headers = [ 
          ('Content-Type', 'text/plain') 
        ] 
    
        start_response(status, response_headers) 
    
        return [response_body] 

    Now, edit app.yaml and add the following handler to it:

    - url: /wsgi 
      script: wsgimain.application 
  2. Note that besides the name of the Python file without its extension, we have the name of the application function separated by a dot.
  3. Run this application and navigate to http://localhost:8080/cgi. Open another window and browse http://localhost:8080/wsgi. Then, compare the headers that are being printed.

You can see the differences between the CGI and WSGI environments on Google App Engine.

主站蜘蛛池模板: 阳山县| 徐汇区| 南安市| 宁城县| 鄢陵县| 眉山市| 开阳县| 台中县| 化隆| 万全县| 抚松县| 赣州市| 临桂县| 凤翔县| 南投县| 镇康县| 县级市| 比如县| 朔州市| 财经| 霍城县| 黔东| 拉萨市| 应用必备| 江源县| 屏山县| 沛县| 红原县| 平湖市| 陵水| 遂溪县| 扎赉特旗| 敖汉旗| 鱼台县| 柞水县| 天等县| 平泉县| 崇仁县| 德清县| 漳州市| 南川市|