Sunday, September 29, 2013

Iteration - C, Scheme and C#

An iterator enables element-by-element processing of a collection. Before we look at iterators which operate over collection objects, let us examine more fundamental iteration statements.

The C for loop - under the hood

The for loop is a basic building block in languages which derive their syntax from C.

char i;
for (i = 'a'; i < 'z'; i++) {
  printf("%c\n", i);
}
Looking at the assembly generated for the loop (using Visual Studio/Windows running on an Intel processor):

;...
; char i;
; for (i = 'a'; i < 'z'; i++) {
00A81A2E  mov         byte ptr [i],61h  
00A81A32  jmp         main+2Ch (0A81A3Ch)  
;A. Start of loop
00A81A34  mov         al,byte ptr [i]  
00A81A37  add         al,1  
00A81A39  mov         byte ptr [i],al  
00A81A3C  movsx       eax,byte ptr [i]
;B. Compare and jump to the next iteration or out of the loop
00A81A40  cmp         eax,7Ah  
00A81A43  jge         main+53h (0A81A63h)  
;... printf("%c\n", i); is implemented here
; }
;C. Jump back to start of loop
00A81A61  jmp         main+24h (0A81A34h)  
;}
0A81A63  xor         eax,eax  
;...
The compiled code is straightforward. Essentially a pair of jumps (cmp/jge and jmp marked as B. and C. in comments) are used to move to the next loop iteration and eventually terminate the loop.

An iterator in Scheme

An iterator is a mechanism which:
1. Operates on a collection of objects (usually all of the same type, but not necessarily).
2. Provides a way to apply an arbitrary procedure to the current object.
3. Provides a way to move to the next object.
4. Provides a uniform interface for the collection, independent of the type of the objects in the collection.

It is straightforward to implement a simple list iterator in Scheme:

(define (for-each function collection)
  (cond ((null? collection) #t)
  (else (let ((rest (cdr collection)))
          (function (car collection))
            (for-each function rest)))))
               
;Use for-each to print a list of symbols
(for-each print (list 'hello 'world '!))

;Use for-each to square a list of numbers
(for-each 
    (lambda (x)
        (print (* x x)))
    (list 1 2 3))
Our for-each takes an arbitrary function of 1 argument and a collection. It applies the function to each item in the collection till there are no elements left.

We used a recursive function to implement foreach. This looks like an inefficient way to do things, since we would be using up stack frames and if the collection was big enough cause a stack overflow.

However the for-each has been written using a special case of recursion. Since the recursion is in the very last statement, we have what is called tail recursion. With tail recursion, the Scheme compiler can avoid pushing the recursive calls onto the stack. Instead it can optimize the generated code into something similar to the jump mechanism we saw with the generated assembly statement in the C for loop. Whether this optimization leads to better performance in practice or not is another issue which we will not bother about in this post.*

Iterators in C#

That leads us to iterators in C#. One common mechanism consists of a pair of complementary constructs:
IEnumerable interface: Has one method GetEnumerator().
foreach statement: operates on an IEnumerable and allows us to iterate over its objects.


using (EnumeratedFile file = new EnumeratedFile( "c:\\temp\\test.txt" )
  foreach ( var line in file ) {
    Console.WriteLine( line );
  }
}
Here EnumeratedFile implements IEnumerable. foreach applies whatever procedure we provide on EnumeratedFile (here just a Console.WriteLine).

But for this to work, there has to be an enumerator object which IEnumerable.GetEnumerator() provides, which actually iterates over the file. C# provides another interface IEnumerator for this purpose. The enumerator which implements IEnumerator does the real work of giving access to the current object and moving to the next.

IEnumerator has 3 methods whose intentions are obvious from their names: Current (actually a property), MoveNext() and Reset().

Note that the same object can implement both the IEnumerable and IEnumerator interfaces. For example, here is an implementation of EnumeratedFile:

class EnumeratedFile : IEnumerable, IEnumerator, IDisposable
{
  string line;
  StreamReader stream;

  //Constructor
  public EnumeratedFile( string path ) {
    line = string.Empty;
    if ( File.Exists( path ) ) {
      stream = new StreamReader( path );
    }
    else {
      stream = null;
    }
  }

  IEnumerator IEnumerable.GetEnumerator() {
    return this;
  }

  public object Current {
    get { return line; }
  }

  public bool MoveNext() {
    if ( stream == null ) return false;
    return (line = stream.ReadLine()) != null ? true : false;
  }

  public void Reset() {
    stream.BaseStream.Position = 0;
    stream.DiscardBufferedData();
  }

  //We also implement IDisposable to clean up the StreamReader.
  public void Dispose() {
    if ( stream != null ) {
      stream.Dispose();
    }
  }
}
So foreach provides a shorthand for the following:

using ( EnumeratedFile file = new EnumeratedFile( "c:\\temp\\test.txt" ) ) {
  var line = string.Empty;
  while ( file.MoveNext() ) {
    line = (string)file.Current;
    Console.WriteLine( line );
  }
}
foreach can infer IEnumerable interface for array and dynamic types. The C# language spec has details on the behavior of foreach.

C# provides a simpler way to implement an iterator. Instead of implementing Current, MoveNext() and Reset(); we could use the yield keyword to simplify things.

Using yield, EnumeratedFile can be re-written as:

class EnumeratedFile : IEnumerable, IDisposable
{
  string line;
  StreamReader stream;

  //Constructor
  public EnumeratedFile( string path ) {
    line = string.Empty;
    if ( File.Exists( path ) ) {
      stream = new StreamReader( path );
    }
    else {
      stream = null;
    }
  }

  public IEnumerator GetEnumerator() {
    if (stream == null) yield break;
    while ( (line = stream.ReadLine()) != null ) {
      yield return line;
    }
    yield break;
  }

  //We also implement IDisposable to clean up the StreamReader.
  public void Dispose() {
    if ( stream != null ) {
      stream.Dispose();
    }
  }
}
yield does the heavy lifting of generating current/movenext and maintaining state between iterations. Raymond Chen and Jon Skeet have explained the code which is generated by the compiler. We could also look at the generated code directly using Ildasm disassembler or indirectly using a .NET decompiler. IEnumerable and IEnumerator also have generic counterparts IEnumerable< T > and IEnumerator< T >.

What is the big deal?

What is the use of iterator abstraction, if it boils down eventually to a pair of jump statements?

Two things:
1. Iteration is a very common operation while automating work with computers: When we model real world objects, create several instances of them and apply procedures repeatedly on all the instances**. Enough times that many languages provide the iterator as a built in construct or make it easy to write the construct ourselves.

2. By making iterators a part of the language, the language designers provider us users a 'conceptual framework'*** to build higher levels of abstraction. For example IEnumerable in C# lays the groundwork for transforming sequences to sequences, which is one of the underlying ideas behind the Linq framework.


* See this post for tail optimization with respect to C#

** Iteration is not the only way to apply a procedure on a group of objects. Set based operations are another way. In fact, it is a common programming error for application programmers new to databases to iterate over each object in a set, instead of operating over the sets of data, and encounter performance problems as the data set grows.

*** From SICP, on map and similar operations over lists: ...The difference between the two definitions is not that the computer is performing a different process (it isn't) but that we think about the process differently. In effect, map helps establish an abstraction barrier that isolates the implementation of procedures that transform lists from the details of how the elements of the list are extracted and combined. Like the barriers shown in figure 2.1, this abstraction gives us the flexibility to change the low-level details of how sequences are implemented, while preserving the conceptual framework of operations that transform sequences to sequences.

Tuesday, September 24, 2013

Anonymous Methods and Closures in C#

In the last two posts we looked at Action, Func, delegate and callbacks in C#. In this post we look at anonymous methods and closures.

Anonymous Methods

A method is anonymous if it is not bound to a name. A couple of ways to use anonymous methods in C#:

1. The 'new' way, using lambda expressions:

System.Func< int, int > doubler = x => x + x;
2. The 'old' way, using delegate explicitly:

delegate int DoublerDelegate( int d );
class Test
{
  public DoublerDelegate Foo()
  {
    return delegate( int x ) { return x + x; };
  }
}

//In the client, we can call the anonymous method:
Test t = new Test();
int i = (t.Foo())( 2 );
As expected, both ways give the same result.

Under the hood

Looking at the IL in the Ildasm disassembler for the 'old' way (explicitly using delegates), we see:

As expected, a DoublerDelegate which extends System.MulticastDelegate.
And inside the Test class, the compiler generated an anonymous static method corresponding to our anonymous method:

.field private static class anonymous.DoublerDelegate 
'CS$<>9__CachedAnonymousMethodDelegate1'
//...

.custom instance void 
[mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() 
= ( 01 00 00 00 )
 
.method private hidebysig static int32  
'
b__0'(int32 x) cil managed { //... Implements our anonymous doubler method } // end of method Test::'b__0'

In the DoublerDelegate instance, the static method is invoked.

IL_0009:  ldftn    int32 anonymous.Test::'b__0'(int32)

This static method gets called in main, in lieu of our anonymous method.

If we looked at the IL for the 'new' way with Func, we would see that the IL generated for the anonymous method is similar: A static method is generated and called from the Func.

Non-local variables and closure(?)

Our anonymous doubler methods only accessed the variable x. x is local to the anonymous method and not influenced by the world outside the method once it is bound to a value.
This is why the compiler can generate just a static method for the anonymous method. The method does not need external state.

Let us introduce a variable which is non-local to the anonymous method.

public DoublerDelegate Foo()
{
  int i = 10;

  return delegate( int x ) {
    i = i + x;
    return i * x; 
  };
}

i is non-local to the anonymous method, but used by it (in other words, i is a 'free variable'). Now the compiler would have to account for the variable i, whenever the anonymous method is called.

It does this by generating a new class.

.class auto ansi sealed nested private 
beforefieldinit '<>c__DisplayClass1'
       extends [mscorlib]System.Object
{
  .custom instance void 
  [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() 
  = ( 01 00 00 00 ) 
} // end of class '<>c__DisplayClass1'
This generated class encapsulates both the non-local variable i, and the generated anonymous method which uses it.

Raymond Chen's post from years ago elegantly describes how anonymous methods are generated.

BTW, there seems to be some confusion among people who know better than me, whether the example above is a true 'lexical closure'. For example in Scheme, the closure would consist of the anonymous function and the environment which references it. In case of C#, this is simulated by generating classes and methods.

For my point of view, of just another application programmer, the above code is an example of closure in C#.

Garbage Collection is important

In the example, the variable i has to live even after the enclosing method Foo() has returned. Why? Because the anonymous method references it, and it could live on based on the client code's intentions. We saw the compiler generated an internal class which maintains the non-local variable i. But who frees the generated class? The answer is obvious - the garbage collector. The internal object would be marked for deletion whenever the referencing anonymous method is freed. While this is a trivial question in C#, it would not be so simple if the closure would have to be freed in the absence of an omnipresent GC.

Gotchas

In C# anonymous methods, ref and out parameters cannot be captured automatically. The GC essentially works on the heap, not the stack. Reference params are not moved to the heap in C#, they live on the stack and are not captured in the closure. If we do need ref parameters, we would need to make local copies ourselves. In other words we have to handle the copy-in/copy-out.

This post from one of the implementers of anonymous methods in C# explains the copy-in/out semantics. There are other posts on his blog which explain deeper nuances of anonymous methods in C#. The matter of ref and out parameters, stack and heap are also explained in detail by Eric Lippert.

In Practice

Delegates which simulate functions-as-first-class-citizens are a useful feature in C#. They allow us to better express verbs (functions) in what is essentially a kingdom of nouns (classes). This helps an application programmer in terms of expressiveness, composability and terseness.

But how useful are anonymous methods and closures to just another application programmer like me?

Anyone who has used Linq, threading or task library in C# in some depth would be familiar with this topic. Here we use an anonymous method which forms a closure over the non-local variable 'extra'.

var extra = 10;
var result = myList.Select( x => x + extra );

Consider another example. We format a string based on some inputs. We need to insert a comma into the string based on the inputs.

public string SerializeToStr( string subject, string status )
{
  StringBuilder output = new StringBuilder();
  output.Append( "{'ticket':{" );
  bool aFieldExists = false;

  Action< string, string, string > Append = ( string item, string header, string trailer ) => {
    if ( item != null ) {
      if ( aFieldExists ) {
        output.Append( ", " );
      }
      aFieldExists = true;
      output.Append( header ).Append( item ).Append( trailer );
    }
  };

  Append( subject, "'subject':'" + DateTime.Today.ToString() + " ", "'" );
  Append( status, "'status':'", "'" );
  output.Append( "}}" );

  return output.ToString();
}

This needlessly long example could as well have been implemented with a helper method and a state variable for aFieldExists. However the anonymous method does provide for better locality of reference from an application code's point of view.

Anonymous methods and closures seem to be syntactic sugar in C# from a user's point of view. However they do improve expressiveness, and can be quite addictive to use.


Thursday, September 19, 2013

C# Delegate: A closer look

C# Delegate and Multicast

In the previous post, we looked at the C# callback mechanism using delegates, and the syntactic sugar provided by Func and Action.

Can we dig a little into the C# delegate implementation and learn more?

As we know, the delegate is really a function (method) pointer. But it does more - we can assign multiple methods to a delegate. When the delegate is invoked, it calls the methods assigned to it in order.

Consider 2 logging classes which implement their respective log methods:

public class ConsoleClass
{
//...
  public void LogToConsole( string mesg )
  {
    Console.WriteLine( mesg );
  }
}

public class FileClass
{
//...
  public void LogToFile( string mesg )
  {
    string file = "syslog.txt";
    using ( TextWriter f = File.CreateText( file ) ) {
      f.WriteLine( mesg );
    }
  }
}
The log methods have the same signature. So we can assign them to one delegate and invoke it.

//Test client code
public delegate void SyslogDelegate( string mesg );

private static void TestDelegates()
{
  ConsoleClass console = new ConsoleClass();
  SyslogDelegate d = console.LogToConsole;
  FileClass file = new FileClass();
  //Add second pointer to the delegate (multicast).
  d += file.LogToFile;

  d.Invoke( LogMesg ); //Shorthand: d( mesg );
}
As expected, invoking the delegate calls both the console and file log methods in order.

What is the C# delegate?

Looking at some methods/properties the delegate class itself exposes:

int i = 1;
Console.WriteLine( "Methods which were referred to by " +
                    d.GetType().Name + ": " );
foreach ( var delegated in d.GetInvocationList() ) {
    Console.Write( i.ToString() + ". " );
    Console.WriteLine( delegated.Target.GetType().Name + 
                       "->" + delegated.Method.Name );
    i++;
}
This prints:

Methods which were referred to by SyslogDelegate:
1. ConsoleClass->LogToConsole
2. FileClass->LogToFile
We can infer from the public method GetInvocationList() of the delegate, that the delegate simply holds a list of classes which enclose the referred methods. If we look at the IL for our SyslogDelegate in the Ildasm disassembler tool:

.class public auto ansi sealed Delegates.SyslogDelegate
       extends [mscorlib]System.MulticastDelegate
{
} // end of class Delegates.SyslogDelegate
we can see the delegate keyword generates to a class which derives from System.MulticastDelegate.

Looking at the System.MulticastDelegate class in a disassembler and reading MSDN, we can gather a couple of things:
1. The System.MulticastDelegate is something the compiler uses, we cannot normally inherit from it.
2. Since the delegate keeps a reference to the method, at least a part of the class which contains the method is alive as long as the delegate is alive.

The second point implies that as long as our delegate lives, the GC will keep the referred class around. At least if the method the delegate refers to, uses other variables inside the scope the class**. We can test this easily with a test program and a memory profiler.

This has implications wherever delegates are used, especially when dealing with higher level abstractions. For example, we can forget removing long lived events which could cause the referred objects to live for a long time, leading to a memory leak.

Simulating delegate

To understand the C# delegate mechanism better, let us try to write a naive and limited mechanism using reflection.

public class MyDelegate
{
  public MyDelegate(object target, string method) {
    this.Target = target;
    this.Method = method;
  }
  public object Target { get; set; }
  public string Method { get; set; }
  public void Invoke(Object[] paramArray) {
    MethodInfo m = Target.GetType().GetMethod( Method );
    m.Invoke( Target, paramArray );
  }
}

public class MyMulticastDelegate
{
  public List< MyDelegate > InvocationList { get; set; }
  public MyMulticastDelegate()
  {
    InvocationList = new List();
  }
  public void Add( object target, string method )
  {
    InvocationList.Add( new MyDelegate( target, method ) );
  }
  public void Remove()
  {
    if ( InvocationList.Count > 0 ) {
      InvocationList.RemoveAt( InvocationList.Count - 1 );
    }
  }
  public void Invoke( object[] paramArray )
  {
    for ( int i = 0; i < InvocationList.Count(); i++ ) {
      InvocationList[i].Invoke( paramArray );
    }
  }
In the test client code:

MyMulticastDelegate m = new MyMulticastDelegate();
//Add method 'references' to our delegate
m.Add( new ConsoleClass(), "LogToConsole" );
m.Add( new FileClass(), "LogToFile" );   

//Call delegate
m.Invoke( new object[] { LogMesg } );
This would call both the methods like the implementation with real delegates.

Of course, this is a naive implementation. What the C# delegate mechanism provides is:
- The ability to invoke a method at runtime based on its signature. Note that our implementation does not handle return types in any way.
- A type safe way of invoking multicast function pointers. So we get compile time checks (and with Visual Studio as-you-type checking). This is useful, otherwise we can easily cause runtime errors like we could in a reflection based invoke mechanism.

Let us explore closures and delegates in the next post.
Source Code

Side note: Could unsafe pointers and pinning be used to implement a naive delegate like mechanism 'from scratch'?


** If it were a method with no side-effects it could in theory be generated as a static which does not really require the class instance. BTW if the method actually uses variables outside its local scope but in the enclosing class's scope, we have something similar to a closure. More on this in the next post.

Sunday, September 15, 2013

Callbacks, Delegates, Action and Func in C#

C# delegates and generics

We can consider C# delegates as type safe pointers to methods.

Take a simple delegate SomeAction which takes a generic type and returns nothing.

delegate void SomeAction< T >( List< T > input);
We can use this delegate to refer to any method which has the same method signature.
In the code below an instance of SomeAction refers to PrintList

static void PrintList< T >( List< T > inList )
{
  inList.ForEach( ( i ) => { Console.Write( i + " " ); } );
  Console.WriteLine();
}

List< int > test = new List< int >(){1, 2, 3, 4, 5};
SomeAction< int > PrintIntList = PrintList;
PrintIntList( test );
//Output: 1 2 3 4 5

A map method in C# using delegate

Consider a map function, which takes as arguments a list and another function. It applies the function to all members of the list, to produce an output list. A naive implementation in Scheme could be:

(define (map f inlist)
  (if (null? inlist)
    '()
    (cons (f (car inlist)) (map f (cdr inlist)))))
So, if we want to apply a function which doubles a list:

(map (lambda (i) (* 2 i)) '(1 2 3 4 5))
;output => '(2 4 6 8 10)
(A note on implementing map in Scheme is here)

To implement a similar map method in C#, which operates on a List:

delegate T GenericDelegate< T >( T args );

static List< T > Map< T >( List< T > inList, GenericDelegate f )
{   
  List< T > outList = new List< T >();
  for ( int i = 0; i < inList.Count(); i++ ) {
    outList.Add( f( inList[i] ) );
  }
  return outList;
}

List< int > test = new List< int >(){1, 2, 3, 4, 5};
List< int > doubleTest = Map( test, ( i ) => { return 2 * i; } );
doubleTest.ForEach( ( i ) => { Console.Write( i + " " ); } );
//2 4 6 8 10 

Action and Func

These are just syntactic sugar in C# for generic delegates.
We could use this in place of the generic delegate we used in the previous section:

delegate TResult Func< in T, out TResult >(T arg);
So we could implement the map method using a Func overload as:

static List< T > MapWithFunc< T >( List< T > inList, Func< T, T > f )
{
  List< T > outList = new List< T >();
  for ( int i = 0; i < inList.Count(); i++ ) {
    outList.Add( f( inList[i] ) );
  }
  return outList;
}

List< int > test = new List< int >(){1, 2, 3, 4, 5};
List< int > doubleTest = MapWithFunc< int >( test, ( i ) => { return 2 * i; } );
doubleTest.ForEach( ( i ) => { Console.Write( i + " " ); } );
//2 4 6 8 10 
Action is a limited form of Func, it only represents a delegate which takes argument(s) and does not return anything. For example:

delegate void Action< in T > (T arg);
C# provides Func and Action for 1 to 16 arguments, which the language designers think should be enough for most cases.

C# Action, practical use

Consider a solution with a hierarchy of projects

Solution
|
|
---Common
|
|
---Web (References Common)

At times, we would like classes in Common to call methods in the Web project. However, by design, we do not want Common to reference the Web project. We can solve this problem by using callbacks, which in C# are Action or Func (or delegates).

Let us say there is a class in the Common, CommonUtil with a method UsefulMethod

namespace Common {
  public class CommonUtil {
    //...
    public void UsefulMethod(string someArg, Action< string > syslogOnError) {
      //...
      syslogOnError( "Some common error in the context of the caller" );
Now in the Web project, we call the UsefulMethod from a Controller class:

using Common;
namespace Web {
  public class TestController : BaseController {
    CommonUtil util = new CommonUtil();
    util.UsefulMethod("abc", new SyslogControllerDelegate( this ).Syslog());
    //...
Where the Syslog method would use the controller to add extra context to a syslog message:

namespace Web {
  public BaseController controller;
  //...
  public Action< string > Syslog() {
    Action< string > syslogAction = delegate( string mesg ) {
      //Writes controller context in addition to mesg to the syslog
      new SyslogBuilder( controller ).Info( mesg );
    }
  }
}
So when UsefulMethod calls its syslogOnError, in addition to the Common error we would also get whatever extra context SyslogBuilder wants to print about the calling TestController. Note that Common did not have to reference the Web project to do this, it just used a callback provided by the calling Web class.

Conclusion

We can consider Action and Function as syntactic sugar for generic delegates in C#. They are a simple and type safe way to implement callbacks (function pointer like functionality) in C#.

Wednesday, September 4, 2013

Back to basics: A demo http server and the TCP/IP stack on Windows

Let us build a web server

At work, I spend a lot of time on trivial things like 'how to fix html/css to make this page work for IE7'. I realized it has been a long time since I had to look at how the web server which powers our product really works under the hood. The best way to refresh my memory was - implement a web server and learn how it works bottom up. I used the language and platform I am currently using at work: C# and Windows.

Webserver implementation

Source code in GitHub
I updated code someone left on pastebin. The link is gone now, so I cant reference it. This is a toy implementation, error handling or performance are not considered. It uses the .NET TCP library and implements a small subset of the http server side protocol. Features:
- Simple threaded handling of requests
- Logging
- URL re-write
- Implement GET and POST
- CGI meta-variables
- Powershell as a CGI extension
- Process Cgi and File GET
- Implement some error codes - 400, 403 and 404


TCP/IP

If I had to implement any of the protocol layers, the RFCs and standards corresponding to that layer would be a good starting point.
Diagram modified from: http://www.codeproject.com/KB/IP/serversocket.aspx

Windows TCP/IP stack

OSI Layers 1 and 2

The driver interface is generally implemented by a Network Interface Card (NIC) vendor. To program a driver, you need to get the Windows Dev Kit (WDK), ndis.lib (Network Driver Interface Specification library). Good places to start would be the WDK 'passthru' sample and source code for the WinPcap (Windows Packet Capture) library. C or C++ are the languages to program with the NDIS.

OSI Layers 3 and 4

TCP/IP is implemented in the protocol driver TCPIP.sys, as part of the Windows kernel. At this layer AFAIK there is limited programmability. Windows does provide several configuration settings in the registry: TcpNumConnections, TcpWindowsSize and others. There are command line programs which operate at this level: arp, route, netstat, tracert, netsh, nsloopup, ipconfig. I have read about people who actually patch the TCPIP.sys, but I dont know how this is done.

OSI Layers 5-7

At this layer we are above the kernel in the user space. Various dlls implement different services:
TCP and UDP sockets - ws2_32.dll, winsock.dll
WinInet (FTP, HTTP 1.0, HTTP 1.1) - wininet.dll
RPC - rpc*.dll
NetBIOS - netapi.dll
WNet - mpr.dll
Diagram modified from: http://i.msdn.microsoft.com/dynimg/IC72693.gif

Protocol

Notes on protocol specs for the web server implementation.

HTTP 1.1 – RFC 2616

- The HTTP protocol is a request/response protocol.
- A client sends a request to the server in the form of a request method, URI, and protocol version, followed by a MIME-like message containing request modifiers, client information, and possible body content over a connection with a server.
- The server responds with a status line, including the message's protocol version and a success or error code, followed by a MIME-like message containing server information, entity meta-information, and possible entity-body content.

MIME – RFC 2045 etc.

- Multipurpose Internet Mail Extension
- Defines how to send data over the internet
- Content-types: text, multipart, message, application, image, audio, video

Diagram from: http://docs.oracle.com/cd/E19957-01/816-6028-10/asd3j.htm

HTTP Communication

request chain ------------------------>
<---------------------- response chain
UserAgent------------- Origin server
example
Browser---------------- Webserver


request chain -------------------------------------->
<------------------------------------- response chain
UserAgent --------A-------B------- Origin server
example
Browser----Firewall----Router----- Webserver

Http Request:

HTTP-message = Request | Response
Request = Request-Line
*(( general-header
| request-header
| entity-header ) CRLF)
CRLF
[ message-body ]
Request-Line = Method SP Request-URI SP HTTP-Version CRLF
Method = "OPTIONS" | "GET" | "HEAD" | "POST" | "PUT" | "DELETE" | "TRACE" | "CONNECT" | extension-method
...

Http Response:

Response = Status-Line
*(( general-header
| response-header
| entity-header ) CRLF)
CRLF
[ message-body ]
Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
Status-Code = "100"| "101"| "200"| ...
Reason-Phrase = *
...

CGI – RFC 3875

Common Gateway Interface
How a HTTP server and script/program share responsibility to respond to client request.
Request meta-variables.
Contain data about the request passed from the server to the script.
Can be accessed by the script in a system defined manner.
CGI Meta variables: "AUTH_TYPE" | "CONTENT_LENGTH" | "CONTENT_TYPE" | "GATEWAY_INTERFACE" | "PATH_INFO" | "PATH_TRANSLATED" | "QUERY_STRING" | "REMOTE_ADDR" | "REMOTE_HOST" | "REMOTE_IDENT" | "REMOTE_USER" | "REQUEST_METHOD" | "SCRIPT_NAME" | "SERVER_NAME" | "SERVER_PORT" | "SERVER_PROTOCOL" | "SERVER_SOFTWARE" | scheme | protocol-var-name | extension-var-name


Boston, MA, United States