Wednesday, February 14, 2018

Using windbg to improve page load time

Page load time

I work in the software department of a company which manufactures various types of marketing materials*.
Our e-commerce website sells these products in more than 20 countries and supports a variety of languages. The products we sell can be customized by users. The code base is made up of systems built over more than 2 decades.

Like many other e-commerce companies holiday in the US and Europe is a big time of the year in terms of traffic and sales. The effect of any production issues on the website is magnified during this time.


I was working on a relatively newer system to which several of our user facing page types were being migrated. The original system, 'the monolith' was for the most part written in-house. It has been debugged and tuned to a high degree over time. Parts of the monolith were being moved to a service oriented architecture, and the Windows/ASP.NET based CMS I was working on merged many of the new micro-services onto user facing web pages. For several months before holiday, the focus was on migration and getting new features onto the new CMS.

Due to various reasons, page load performance on the CMS became slow over time. Consider the car door magnets page which was a relatively lightweight product page with a combination of static and dynamic content:


That was not great considering the html from the CMS was actually presented in between the header and footer from the monolith. We had unofficially set the page load time of the CMS to be within 200ms server side, but over time, as new features were added, we went above this limit.

Holiday gave an opportunity for some of us to take a break from chasing after feature requests from product owners and tackle reliability and performance issues. We knew that upcoming in-house gameday load tests would push the requests from the monolith to the point where we would start timing out with our current infrastructure presenting a real risk to production traffic and revenue.

By this time we had our internal code freeze which would not allow for roto-tilling the design and code underlying the pages. So we had to look for improvements by tweaking the existing code. Fun!

The first problem was that we had stopped running regular load tests. My colleague Joe updated an old Locust based load test which we could run locally and in our lower environments. The performance issue was obvious: we were constrained to ~10 requests per second across all pages before CPU started reaching its upper limits on individual VMs.


Investigation

Looking in our NewRelic, Sumologic and application logs in the load test environment did not give us clues to possible bottlenecks. Joe generated a few dump files of the IIS worker process and sent it to me for analysis.

Using windbg to look for clues. Often times half the work in windbg analysis is getting dlls loaded correctly. Have annotated my setup with comments:


.echo "Prep work"
.echo "From image where load test was run get -> sos, mscordacwks and clr dlls"
.echo "Only sos should be needed most times"
.echo "Using sos.dll from localhost may not get correct results"
.echo "Ensure we have dlls/pdbs compiled for relevant release/compiler flags"
.echo "End Prep work"

.echo "set noisy symbol loading"
!sym noisy
.echo "set symbol path to: microsoft symbol server, 
.echo "and the path to our dlls"
.sympath+ srv*e:\symbols*http://msdl.microsoft.com/downloads/symbols;E:\windbg\dev_10_4\bin
.echo "load sos.dll"
.load E:\windbg\dev_10_4\sos.dll
.echo "Load sosex and psscor4 extensions"
.load E:\work\windbg\Psscor4\amd64\amd64\psscor4
.load E:\work\windbg\sosex\sosex.dll
Using the runaway command and looking at thread stacks did not reveal anything obvious.

We did notice several GC related threads on the stacks. We had not noticed any obvious memory leaks, but then we release to prod every day making slow leaks unnoticeable.

Joe set up .NET CLR memory perf counters and noticed that the Gen 2/1/0 collections were not in the ratio of 100/10/1 mentioned here. Something to investigate.

Going back to the dump files:
Sosex's dumpgen on the large object heap did reveal something interesting, there were several string objects representing htmls. A sample:

Looking into the code, this led to a method:

where value was being passed in as string.

So a simple but oft-repeated bug, operating on strings instead of streams, which shows as a problem when the strings grow larger and are used in different ways.

Streams over strings

The solution already existed a few lines below but the method had a bug, which was probably why consumers of the library were not using the method.

Fixed this code based on advice from my co-worker Chris:

Load test showed noticeable improvement. We could go from ~10rps to ~32rps:

After production release the next day, we looked at the car door magnets page:


A nice boost! But still a little off from our 200ms goal and holiday was approaching.
More in the next blog post.

*Views are my own, not of my employer Cimpress.

Monday, January 1, 2018

Monolith to Microservices - a debugging story


Search

I work for an e-commerce company which manufactures customizable products*.

Over the past couple of years we have been migrating from a monolithic system towards a service oriented one. The migration work is still in process and has been mostly successful so far. Product search was a cross cutting concern which had to work across both the old monolith world and the new service based one. The monolith based search implementation obviously did not consider products introduced on the newer platform.


The opportunity

A couple of months back, I got the opportunity to move search to the new microservice based platform. We had decided to use a third party search engine vendor and the off-monolith microservices had reached a point where a search MVP was conceivable.

Couple of aims we had in mind:
1. Drop some of the bells and whistle provided by monolith search. Mainly drop some search result types and focus on product search.
3. Try to match the monolith performance in terms of page load times. This seemed hard since we were introducing extra hops including a new vendor based service which was hosted on a different data center.

The design was simple, and the microservices provided on the new platform lent themselves neatly to some of the aims:

Data feed/index:


Search:


Problems with response time

Things went well as we integrated the systems, but there was a problem. We had set a limit of less than 200ms response time for the https calls from the search results api. This was essential for us to keep within the performance budget for the service call while serving up search result pages.

The average response time was in the 500ms range. Traceroute showed the problem:

 1    <1 ms    <1 ms    <1 ms  ip-.eu-west-1.compute.internal
...
22    13 ms    13 ms    13 ms  REMOVED.lon2.datapipe.net [REMOVED]

Our data center is located in AWS Ireland, while the search engines data center was in Rackspace New Jersey. This was easy enough to fix, the search engine provider relocated to the Rackspace London data center and the response times went within the acceptable limit. The miss seemed obvious in retrospect. For some reason I had assumed this had been considered when the contract was signed - guess engineers were not very involved at that point :)

We decided to A/B test the implementation at 50-50 traffic across our top two locales. The revenue metrics in the test were slightly down and we noticed bursts of timeout errors from the search engine in our logs. Occasionally the timeouts tripped the circuit breaker policy we had in place.

The timeouts were effecting ~2-3% of the search traffic.

Time to debug

There were three possibilities that came to mind:
1. The http client implementation on our side
Did not turn up much. The same http client is used across various service client calls and there was no indication of port exhaustion or some such problem on our servers/load balancers.

2. Something in the connection or network
Going over ping and MTR data did not show much either. I emailed one of our systems engineers with the traceroute data and got no reply. My guess was he did not think there was much to go on.

3. The vendor's service runtime
This was something I did not have visibility into. How to debug and convey useful information to the (external) service provider? What I had was the http client logs with timeout and circuit breaker events.

At this point, I will switch to extracts from emails.

-------
Me:
We don’t see a drop in the number of timed out requests. Around 12:48pm EST in the afternoon, we had our software circuit breaker trip a few times due to a series of requests timing out (the policy set in code is trip the circuit breaker for 30 seconds for 10 consecutive failures – the circuit breaker tripped 4 times).
We also realized that while we have a circuit breaker policy, we did not set a retry policy for calls to the search engine. I have this in code now and should go to prod tomorrow, will let you know if this improves things.
Another option we are considering is fronting search responses with CDN...

Vendor:
As we cannot locate httpd logs for the sample requests that client said were timing out, we suspect that the error happened in the first leg of the request, from client to us. And with the loss at only at 2~3%, traceroute is likely not going to show where the problem is. We'd need to run multiple traceroutes to possibly detect the problem.
Here is what we can offer in terms of improving performance:
1. This timing out might be caused by port exhaustion on the servers, would you please check netstat to ensure that this is not the issue?
2. We have noticed that there is no compression requested in the request, we would recommend sending an
Accept-Encoding: gzip header to improve performance...
-------


The search responses were paged and hence small, gzip while it would improve things did not seem to be a reason for the timeouts. Need more data to help us all narrow down the possible cause.


-------
Me:
The timeouts are occurring at a similar rate today as yesterday.

I had a thought about one possible cause and way to reduce it, could you consider it:
- The timeouts seem to occur is bursts, usually the burst lasts 1-2 minutes and then there are several minutes of search queries with no timeouts.
- The timeouts occurs for various locales, though the occurrence is more in en-us, that might be a reflection of the traffic.
- The timeouts occurs across all our servers, there is no correspondence to increased load or network activity. It occurs at similar rates on all days of the week.
- I have attached an Excel sheet with timeouts and periods of no timeouts from 1:00am to 4:30pm EST on 4/17.
Is there any process which runs at these time intervals on the server(s) which serve up our company queries? Perhaps a re-index or some such?

-------


The timing data did the trick! The vendor was able to isolate the problem to a process on their side. Turned out they ran auto-deploys which were having an effect on the run time behavior of the response system.


-------
Vendor:
Thanks again for the below information, we have a short term resolution in place:
We have disabled autodeploys for the time being, which are causing these full loads to happen with each deployment. This should put a stop to the timeouts in the short term while we address the source of the issue.
Will keep you updated on the permanent solution.

Me:
Nice work, looks like that was it. No timeouts since ~6:49pm EST yesterday (4/17). Please convey my thanks to the dev team too, have attached a picture from our logging system showing the stop in timeouts.


Vendor:
We have made the loads asynchronous and they should no longer block search. This has been deployed to your cluster and there should no longer be timeouts caused by this particular issue.
-------


Server side time was within limits now, the brown section in the graph corresponding to the search engine response time ~100ms: well within the 200ms limit we had set:

Epilog


The MVP went well.

Beyond this; we dont have to depend on the legacy search system, the search engine/vendor can be replaced easily in the face of risks and the MVP has opened up the door for other industry standard search features.

In the service world where we have limited visibility into the service provider implementations, timing data can be a valuable source of debugging information.

*Views are my own, not of my employer Cimpress.

Sunday, October 1, 2017

Learnings from writing a Chrome extension

A couple of months back, I thought of writing a Chrome browser extension to:
- Make a small application on my own. I rarely get to do this in my day job, where I build small pieces in a huge and sometimes unnecessarily complicated code base.
- Understand better the http(s) protocol as it relates to the browser.
- Brush up on my rusty JavaScript in the context of the browser, with minimal use of libraries.

I thought I would build a simple http observer which sniffs http traffic and displays details of each request for one Chrome tab. This is one of the many features of the Chrome devtools network tab.

The extension can be installed on the Chrome browser here: http observer Chrome extension.While it does use several permissions on the browser, it contains no intentional malice and the source code is on github.

There is nothing the extension does that is not provided by Chrome devtools, and the extension is not as non-intrusive with respect to the page load as something like chrome devtools is. Really you should be using devtools network tab for working with the web page network requests. This sample extension just provides a way to visualize the requests and drill down into the details. I thought I would share what I learned while building the extension.


Building a Chrome extension

A good starting point is the sample Chrome extensions page. From the comment headers these samples seem to have been written a while back, but there is a relatively newer github repo with all the samples. In the repo, I found a sample which gave me a great start on what I wanted to make.

The Chrome extensions getting started page gives a tutorial on building extensions. The sample and links in that article should put you well on the way to writing useful extensions.
Some terms:
manifest.json: Has important information about your application. Some of these are:
Background page/script: Long running html page or script to manage state or tasks.
Content scripts: Scripts which execute in the context of a page.
Permissions: The list of resources your extension wants to use in the browser. Some of these permissions are displayed to the user during installation so they can take a call on whether they want to install the extension or not.
The Chrome debugger protocol: As the protocol viewer documentation says, this protocol allows for tools to instrument, inspect, debug and profile Chromium, Chrome and other Blink-based browsers. The protocol is used by Chrome devtools itself. For extensions, there is a higher level Javascript API which provides a convenient way to call the API from javascript.

Http observer design

I decided to make a table of the http requests and display details like headers, cookies and responses for each request. For that I made a simple 2 x 2 popup display. Found out sometime later that the maximum dimensions on startup was 800px wide and 600 px height. This is painful since the extension window has to be resized or maximized each time after startup to display content optimally. I don't have much of a clue when it comes to UI design.

Most of the application deals with some http request and response intercept events provided by the Chrome debugger API. It uses the data from these events to display information in tables.

The source code for the extension is on github, I admit it is bad code, which I might refactor if anyone but me uses it. I avoided using third party Javascript libraries. I did use a minimal and excellent css framework Milligram to help me with the display layout.

Some features

- Http requests are displayed as a table.
- For each request chosen, the details are displayed in other tables.
- All the http related tables are sortable.
- The full details for a request are display as json at the bottom of the popup.
- The browser cache can be disabled/enabled. This is a good way to see the effect of local browser caching on page load.
- Once the http requests have 'settled' for a page (a minute or so, depends on the busyness of the page), a summary is displayed which tries to display the overall way in which the page made http requests through its lifecycle. It is surprising to see the number of non-essential requests some pages make to analytics and other 'partners' these days. Below is a screenshot of a better behaved page.


Some caveats

- The extension will not work when Chrome dev tools is open.
- The extension only works against the tab that it was opened from.
- For more details and better performance, you should really be using Chrome devtools.

User options and configuration

While I have not added user options to the extension, settings provide a way for users to configure the extension based on their need. For example in the http observer it would be good to limit the number of http requests to be displayed per page. Here is a good example of how to implement options.

Publish an extension

Once you have written an extension, you can release it on the Chrome developer store. You need a google account and can follow publish instructions. There is a one time 5$ signup fee to publish apps.

Wednesday, June 10, 2015

Git by diff

Git is a Distributed Version Control System. It stores file history and the current state of repository on the local machine. We can use this to understand the internals of git better. The idea is:

1. Store a copy of the repo folder, including the .git directory where git stores all state information it needs.
2. Do a git operation.
3. Compare the folder from step 1 with the current folder. Learn.
4. Go to 1.

Initial Setup

We create a script cp_gitbydiff.sh which makes a copy of the directory containing a test repo.

#!/usr/bin/env bash
GITDIFFDIR=c:/tmp/gitbydiff
GITDIFFDIR_PREVIOUS=c:/tmp/gitbydiff_prev
rm -rf $GITDIFFDIR_PREVIOUS
cp -r $GITDIFFDIR $GITDIFFDIR_PREVIOUS
I kept the script in a directory which is in my bin path, so I could call it from anywhere.
I use beyond compare as the diff tool, but any good diff tool will do.


Git by diff


Git init

We use git init to create a repository.

$ mkdir gitbydiff
$ cd gitbydiff
$ cp_gitbydiff.sh
$ git init
Initialized empty Git repository in c:/tmp/gitbydiff
Now let us diff the repo directory gitbydiff and its initial (empty) contents gitbydiff_prev.

So git creates a bunch of files in the .git directory. They are mostly directories, and we will not dig into them for now. Let us take a quick look at two of the files:

The config file contains the default configuration for the repository. We can update the configuration using the git config command or by editing this file.
The HEAD file contains a reference to the current branch, which is the master. The text in the file is ref: refs/heads/master.

Git add

Let us add a file, and a directory with a file and add it to git.

File - afile
Directory - adir
Another file - adir/bfile


$ cp_gitbydiff.sh
$ echo "afile contents" > afile
$ mkdir adir
$ echo "adir/bfile contents" > adir/bfile
$ git add .

Diff reveals a few new files in the .git folder. An index file, and a few directories/files in the objects directory.

How do we read these files? Turns out these are compressed files, which can be decompressed to reveal its contents. I borrowed a python snippet which can be used to decompress the file.
$ python -c "import zlib,sys;print repr(zlib.decompress(sys.stdin.read()))" < c:/tmp/gitbydiff/.git/objects/04/4d81646daa6ef55b55630a66cce5c8e098d5b7
'blob 15\x00afile contents\n'
So this was a compressed file with reference to a 'blob' with the contents of afile! The other object contains the other file bfile which we created.
$ python -c "import zlib,sys;print repr(zlib.decompress(sys.stdin.read()))" < c:/tmp/gitbydiff/.git/objects/b7/404f599ac68db583e53ce9a903ea1c7479f86c
'blob 20\x00adir/bfile contents\n'
Git stores data as objects, which are key value pairs. The key is a checksum of the contents. Git uses SHA-1 hash to generate the checksum.

In the objects directory, git stores the object under:
.git/objects/directory_first_two_chars_of_hash/file_remaining_chars_of_hash

The index is a binary file, whose contents we can see with the git ls-files command:

$ git ls-files --stage
100644 b7404f599ac68db583e53ce9a903ea1c7479f86c 0       adir/bfile
100644 044d81646daa6ef55b55630a66cce5c8e098d5b7 0       afile
In essense the index is a list of SHA-1 hashes, their paths and permissions.

Git commit

Let us commit our change to git.

$ cp_gitbydiff.sh
$ git commit -m "First commit from branch master"
 2 files changed, 2 insertions(+)
 create mode 100644 adir/bfile
 create mode 100644 afile

In the diff we see three new objects, a log directory and a refs/heads/master file.

The refs/heads/master contains the entry of an object: 20c9dc1abcbb5e7e60d5a15da9fc11f3bf83e4c7

What is in this object? Instead of using our python snippet to view it, we can use the git cat-file command to concisely reveal the contents of a git object. Note also that the cat-file command only needs the first few characters of the object's SHA-1 hash

$ git cat-file -p 20c9dc
tree 631d18013981d64940cef00c422fcff00044e432
author xyz  1433905500 -0400
committer xyz  1433905500 -0400

First commit from branch master
So the refs/heads/master points to an object which in turn contains a pointers to a 'tree' object and the commit message.

What is in the 'tree' object 631d1...?

$ git cat-file -p 631d1
040000 tree 309fb8dd6b7c0a9bb2234cd6dc0e5d65fe9fccde    adir
100644 blob 044d81646daa6ef55b55630a66cce5c8e098d5b7    afile
A 'tree' adir and a 'blob' afile. But adir and afile are the directory and file we created!
We can guess now that the contents of the 'tree' adir is a 'blob' bfile which is the file we created under adir. Let us check.

$ git cat-file -p 309fb
100644 blob b7404f599ac68db583e53ce9a903ea1c7479f86c    bfile
Right. bfile and afile objects themselves point to the contents of the file. For example, bfile:

$ git cat-file -p b7404
adir/bfile contents
From this we can infer that git stores a directory structure as a set of pointers with trees representing directories (and the root), and blobs representing files.


is represented as:
branch master -> tree with commit message
                                           -> blob afile -> contents of afile
                                           -> tree adir -> blob bfile -> contents of bfile

The data structure used here is a Directed Acyclic Graph (DAG).

We ignore the logs directory here. This is related to relog which we will not look into.

Git branch

Let us now create a branch.

$ cp_gitbydiff.sh
$ git checkout -b abranch
Switched to a new branch 'abranch'

The HEAD file now contains refs: refs/heads/abranch.
The refs/heads has a new file - abranch, which contains 20c9dc... which is the same as what the refs/heads/master contains. That is the object which points to our directory structure.

This is logical: a new branch was created, git points its HEAD to it since we switched to it and the branch points to the root of our repo.

Let us do a commit now.

$ cp_gitbydiff.sh
$ echo "change to a file from branch abranch" > afile
$ git add .
$ git commit -m "Second commit from branch abranch"
 1 file changed, 1 insertion(+), 1 deletion(-)

We see three new objects and that refs/heads/abranch and COMMIT_EDITMSG files has changed.

refs/heads/abranch points to one of the new objects 89008. Following this object we see

$ git cat-file -p 89008
tree 4bbff3bb19598db5ab1cae521f652a018651221d
parent 20c9dc1abcbb5e7e60d5a15da9fc11f3bf83e4c7
author xyz  1433907011 -0400
committer xyz  1433907011 -0400

Second commit from branch abranch
$ git cat-file -p 4bbff3
040000 tree 309fb8dd6b7c0a9bb2234cd6dc0e5d65fe9fccde    adir
100644 blob 99037eec622590773d235c6abfa44d437fae0c73    afile
$ git cat-file -p 99037
change to a file from branch abranch
What is git doing here?
Since the new branch, abranch, has a modified afile it is updating the state of abranch to reflect this.
It creates a new 'root' object which points to adir and afile. The new afile object that is pointed to is the updated file which is only in this branch. The master still points to the old afile.

For adir/bfile, both master and abranch use the same object, since that directory has not really changed across the branches. Git does not duplicate data for common objects across branches.

Git merge

As a final step in our experiment let us merge the branch into the master.

$ cp_gitbydiff.sh
$ git checkout master
Switched to branch 'master'
$ git merge abranch
Updating 20c9dc1..89008dd
Fast-forward
 afile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

The HEAD is now ref: refs/heads/master. Nothing else changed. Though abranch is merged into master, its objects are left intact.

I will stop at this point. This method can be used to explore other git commands like rebase or concepts like remote repositories. Another good source to learn more about git is by peeking at the git source code itself.

Sunday, March 16, 2014

Scaling a coffee shop (web application)

I own a small coffee shop.

My expense is proportional to resources
100 square feet of built up area with utilities, 1 barista, 1 cup coffee maker.

My shop's capacity
Serves 1 customer at a time, takes 3 minutes to brew a cup of coffee, a total of 5 minutes to serve a customer.

Since my barista works without breaks and the German made coffee maker never breaks down,
my shop's maximum throughput = 12 customers per hour.


Web server

Customers walk away during peak hours. We only serve one customer at a time. There is no space to wait.

I upgrade shop. My new setup is better!

Expenses
Same area and utilities, 3 baristas, 2 cup coffee maker, 2 chairs

Capacity
3 minutes to brew 2 cups of coffee, ~7 minutes to serve 3 concurrent customers, 2 additional customers can wait in queue on chairs.

Concurrent customers = 3, Customer capacity = 5


Scaling vertically

Business is booming. Time to upgrade again. Bigger is better!

Expenses
200 square feet and utilities, 5 baristas, 4 cup coffee maker, 3 chairs

Capacity goes up proportionally. Things are good.

In summer, business drops. This is normal for coffee shops. I want to go back to a smaller setup for sometime. But my landlord wont let me scale that way.

Scaling vertically is expensive for my up and coming coffee shop. Bigger is not always better.


Scaling horizontally with a Load Balancer

Landlord is willing to scale up and down in terms of smaller 3 barista setups. He can add a setup or take one away if I give advance notice.

If only I could manage multiple setups with the same storefront...

There is a special counter in the market which does just that!
It allows several customers to talk to a barista at once. Actually the customer facing employee need not be a barista, just someone who takes and delivers orders. And baristas making coffee do not have to deal directly with pesky customers.

Now I have an advantage. If I need to scale, I can lease another 3 barista setup (which happens to be a standard with coffee landlords), and hook it up to my special counter. And do the reverse to scale down.

While expenses go up, I can handle capacity better too. I can ramp up and down horizontally.


Resource intensive processing

My coffee maker is really a general purpose food maker. Many customers tell me they would love to buy freshly baked bread. I add it to the menu.

I have a problem. The 2 cup coffee maker I use, can make only 1 pound of bread at a time. Moreover it takes twice as long to make bread.

In terms of time,
1 pound of bread = 4 cups of coffee

Sometimes bread orders clog my system! Coffee ordering customers are not happy with the wait. Word spreads about my inefficiency.

I need a way of segmenting my customer orders by load, while using my resources optimally.


Asynchronous Queue based processing

I introduce a token based queue system.

Customers come in, place their order, get a token number and wait.
The order taker places the order on 2 input queues - bread and coffee.
Baristas look at the queues and all other units and decide if they should pick up the next coffee or the next bread order.
Once a coffee or bread is ready, it is placed on an output tray and the order taker shouts out the token number. The customer picks up the order.

- The inputs queues and output tray are new. Otherwise the resources are the same, only working in different ways.

- From a customers point of view, the interaction with the system is now quite different.

- The expense and capacity calculations are complicated. The system's complexity went up too. If any problem crops up, debugging and fixing it could be problematic.

- If the customers accept this asynchronous system, and we can manage the complexity, it provides a way to scale capacity and product variety. I can intimidate my next door competitor.


Scaling beyond

We are reaching the limits of our web server driven, load balanced, queue based asynchronous system. What next?

My already weak coffee shop analogy has broken down completely at this point.

Search for DNS round robin and other techniques of scaling if you are interested to know more.

If you are new to web application scaling, do the same for each of the topics mentioned in this page.

My coffee shop analogy was an over-simplification, to spark interest on the topic of scaling web applications.

If you really want to learn, tinker with these systems and discuss them with someone who has practical knowledge.

Sunday, December 29, 2013

Interview Question: Design an Object Oriented System

I was asked this question in a couple of software engineer job interviews recently. After the interviews, I wondered if there is a general way to approach this question for any system X.

What is the question?

The system to design can be of two types:

1. The system manifests as software processes. Examples would be a rule engine, an app store and so on.
An app store selling software runs as processes in a network of computers. Its interfaces with the outside world can be defined by a handful of software interfaces.

2. The system manifests in the world outside computers. Examples would be an elevator, a parking lot, a DVD lending library and so on.
These systems can be viewed in a couple of ways:

2a. Real time software drives pieces of the physical system. Other software coordinates the pieces. As an example, in a parking lot, the system to design could be the software controlling and monitoring the parts of the parking lot. The software interfaces with the actual hardware using some well defined interfaces.

2b. A simulation of the entire physical system as software processes. Consider 2 examples:
i. A card game, where we model decks of cards, timers and game rules as software processes with a graphical user interface to make the game useful to users.
ii. A parking lot, where we simulate the entire system within software processes. The user interface allows the user to get a view into the system, and perhaps vary system parameters. Reasons to do this may be to study the system's characteristics in a cost effective way or create a virtual reality game.

In a type 2b system, we are essentially converting a type 2 system into a type 1 system.

Before attempting design, I think it is better to agree on a point of view with the interviewer.

If it is a physical system, it may be easier to deal with a model which simulates the system in software in its entirety, rather than one which attempts to design pieces of real time systems. Unless designing an embedded software system complements the job you are interviewing for.

Data structures and algorithms

It may appear that OO design questions can be answered while avoiding data structures and algorithms. This is not true. An interface or class may need to use an abstract data type or a well known algorithm in its implementation.
As an example assume the OO system to design is a syntax checker. You may define an interface which checks for balanced parentheses. The interviewer may want to know how you would implement the interface.
Even if you don’t know the answer off the bat, you could ask the interviewer for a hint or think of a naive answer which might lead to a good solution. The key is to know about the stack abstract data type and its operations. If you don’t know anything about stacks, this could become a difficult problem and raise a red flag with the interviewer.

It is good to know how to estimate time and space complexity of algorithms since interviewers may want to know if you can use this knowledge for design trade-offs.

Some ways to improve knowledge of data structures and algorithms:
Books Internet articles
Exercises
Perusing the standard library of your language
Work on projects.

Heuristic

Some approaches to the object oriented job interview question:

1. In my interviews, I wrote down interfaces and base classes which occurred as I described use cases to the interviewers. Then I began writing out the interactions between the objects. I asked questions when I faced ambiguity in my assumptions. This approach allowed me to engage the interviewer in the design process. However I thought it was a vague method when we got into the details and the interviewers started adding and modifying requirements.

2. A more precise way is to write down use cases first. Based on the use cases, come up with the objects. Then use a method like CRC cards to flush out the interactions between the objects. Use a message sequence diagram to refine the interactions. Finally use a deployment diagram to show various processes which host the objects as they run on computers and network elements. While this is a methodical way of doing object oriented design, it may not be the best approach within the constraints of a job interview.

Message Queue

Here is another way. Assume:
- A central priority message queue where any object can send a request for something to be done. The message queue automatically routes the message to the destination object.
- Unless there is a specific need, objects are created automatically and their lifetime is infinite (at least for the time of the interview).

So we assume an event loop, to which objects can send messages. The event loop routes the message to a suitable destination object, creating it if necessary.

Consider part of the design for an elevator bank.

This is not very different from the other methods given in the previous section; it just provides a compressed approach for the interview.

Events, Async Calls and Collections

In case of an asynchronous call or a call to a collection of objects, it helps to introduce a controller object which serializes the async events or helps choose an item from the collection. We did this in the call to the elevator bank controller, which chooses the optimal elevator to send in response to a passenger's request message.

Source Code

The point of the design is to clarify the domain and make good choices while solving the problem. At this point, we should have enough to switch over to code. This helps make the discussion more concrete.

Obviously we want to use the language which we are most comfortable with, unless the system to design is best expressed in a specific language. For example, if the system to design is a widget which is hosted within a web browser, JavaScript would be a good choice, even if Scheme is your favorite language. In this case it is likely the job involves some client side browser programming, and you should expect to use JavaScript at work.

Using higher level constructs like interfaces in the initial design allows for more flexibility while dealing with modified requirements. But remember such constructs usually imply indirection and you have to deal with the added complexity in the discussion with the interviewer.

Object Oriented Concepts

The main objective is to design a system. A well prepared interviewer will introduce additional requirements into the design, and this is a cue for you to leverage OO design concepts. Objected oriented design concepts like inheritance, polymorphism, encapsulation and dynamic dispatch should be used judiciously (and perhaps sparingly) in the design process. In the elevator design example, the interviewer might ask for a service elevator which is somewhat different from normal elevators to be added to the elevator bank. In case of a rule engine, the interviewer might add additional rule types which you may not have considered in the design.

A Builder

If the interview asks how the processes are setup, we can introduce a builder which translates a configuration into the actual objects at startup time.
For example consider an xml configuration for elevators:

<ellevatorbank>
    <ellevator id="1">
        <door id="1">
        </door>
        <allowedfloors>
            <floor id="1">
            </floor>
            <floor id="2">
            </floor>
        </allowedfloors>
    </ellevator>
    <ellevator id="2">
        <!-- etc. -->
    </ellevator>
<!-- etc. -->
</ellevatorbank>

The builder would just read the configuration file at startup and instantiate the corresponding objects required at startup. It may be good to think about a mechanism where the system behavior may be re-configured at runtime too.

Message Queue Again

With the interfaces and objects in place, we now need a way to put things in motion. We can introduce a priority message queue again. So we start an event loop which serializes messages and routes them to their destination. We can draw parallels with the way load balancers, web servers or operating systems deal with events which typically start or end on external actors.

Using a priority queue helps deal with messages which have to be executed ahead of the order in which they arrive. For example, a maintenance message from a technician to the elevator bank should take precedence over a normal call originating from a passenger.

Design Patterns

Making design patterns more important than the design might signal to the interviewer that we want to complicate the design rather than simplify it.

If I chose a design pattern, I would clarify if it is okay with the interviewer. Where possible I would prefer a well known data type or algorithm which may be available as a language library over a design pattern. And when I choose a design pattern, it would be a one I have used before in practice, not just read about in a book.

Generally avoid buzzwords and concepts you are not clear about.

Functional Programming Design? Procedural Programming Design?

I think it is valid to use alternate design methodologies to answer the question. Couple of things to consider:

- Does the group or company you are interviewing with have an environment where solving problems take precedence over the language and technology being used? There may be good reasons for this question to be answered either ways, and you may want to know the answer before hand using the internet and your contacts.
- Confirm with the interviewer if it is okay to approach the problem using an alternate design method.

Some Tips

In the design process, it helps to keep in mind:
1. The system to be designed is most important; any concepts used in the process should be to help in this design.
2. Articulate any design trade-offs you recognize to the interviewer.
3. Avoid buzzwords or terminology which you don’t understand well.
4. If you are faced with ambiguity, discuss a resolution with the interviewer.
5. At some point, reason with code, it can help focus the design process. Choose a language you are comfortable with.
6. Assume that data structures and algorithms cannot be avoided.
7. Assume that the interviewer will modify the constraints and requirements you have assumed in the initial design process.
8. With an open ended question like object oriented design, you have as much as of an opportunity to gauge your future colleagues and work environment as they have to judge you. Use it as an input while deciding if you want to take the job.

Sunday, September 29, 2013

Iteration - C, Scheme and C#

An iterator enables element-by-element processing of a collection. Before we look at iterators which operate over collection objects, let us examine more fundamental iteration statements.

The C for loop - under the hood

The for loop is a basic building block in languages which derive their syntax from C.

char i;
for (i = 'a'; i < 'z'; i++) {
  printf("%c\n", i);
}
Looking at the assembly generated for the loop (using Visual Studio/Windows running on an Intel processor):

;...
; char i;
; for (i = 'a'; i < 'z'; i++) {
00A81A2E  mov         byte ptr [i],61h  
00A81A32  jmp         main+2Ch (0A81A3Ch)  
;A. Start of loop
00A81A34  mov         al,byte ptr [i]  
00A81A37  add         al,1  
00A81A39  mov         byte ptr [i],al  
00A81A3C  movsx       eax,byte ptr [i]
;B. Compare and jump to the next iteration or out of the loop
00A81A40  cmp         eax,7Ah  
00A81A43  jge         main+53h (0A81A63h)  
;... printf("%c\n", i); is implemented here
; }
;C. Jump back to start of loop
00A81A61  jmp         main+24h (0A81A34h)  
;}
0A81A63  xor         eax,eax  
;...
The compiled code is straightforward. Essentially a pair of jumps (cmp/jge and jmp marked as B. and C. in comments) are used to move to the next loop iteration and eventually terminate the loop.

An iterator in Scheme

An iterator is a mechanism which:
1. Operates on a collection of objects (usually all of the same type, but not necessarily).
2. Provides a way to apply an arbitrary procedure to the current object.
3. Provides a way to move to the next object.
4. Provides a uniform interface for the collection, independent of the type of the objects in the collection.

It is straightforward to implement a simple list iterator in Scheme:

(define (for-each function collection)
  (cond ((null? collection) #t)
  (else (let ((rest (cdr collection)))
          (function (car collection))
            (for-each function rest)))))
               
;Use for-each to print a list of symbols
(for-each print (list 'hello 'world '!))

;Use for-each to square a list of numbers
(for-each 
    (lambda (x)
        (print (* x x)))
    (list 1 2 3))
Our for-each takes an arbitrary function of 1 argument and a collection. It applies the function to each item in the collection till there are no elements left.

We used a recursive function to implement foreach. This looks like an inefficient way to do things, since we would be using up stack frames and if the collection was big enough cause a stack overflow.

However the for-each has been written using a special case of recursion. Since the recursion is in the very last statement, we have what is called tail recursion. With tail recursion, the Scheme compiler can avoid pushing the recursive calls onto the stack. Instead it can optimize the generated code into something similar to the jump mechanism we saw with the generated assembly statement in the C for loop. Whether this optimization leads to better performance in practice or not is another issue which we will not bother about in this post.*

Iterators in C#

That leads us to iterators in C#. One common mechanism consists of a pair of complementary constructs:
IEnumerable interface: Has one method GetEnumerator().
foreach statement: operates on an IEnumerable and allows us to iterate over its objects.


using (EnumeratedFile file = new EnumeratedFile( "c:\\temp\\test.txt" )
  foreach ( var line in file ) {
    Console.WriteLine( line );
  }
}
Here EnumeratedFile implements IEnumerable. foreach applies whatever procedure we provide on EnumeratedFile (here just a Console.WriteLine).

But for this to work, there has to be an enumerator object which IEnumerable.GetEnumerator() provides, which actually iterates over the file. C# provides another interface IEnumerator for this purpose. The enumerator which implements IEnumerator does the real work of giving access to the current object and moving to the next.

IEnumerator has 3 methods whose intentions are obvious from their names: Current (actually a property), MoveNext() and Reset().

Note that the same object can implement both the IEnumerable and IEnumerator interfaces. For example, here is an implementation of EnumeratedFile:

class EnumeratedFile : IEnumerable, IEnumerator, IDisposable
{
  string line;
  StreamReader stream;

  //Constructor
  public EnumeratedFile( string path ) {
    line = string.Empty;
    if ( File.Exists( path ) ) {
      stream = new StreamReader( path );
    }
    else {
      stream = null;
    }
  }

  IEnumerator IEnumerable.GetEnumerator() {
    return this;
  }

  public object Current {
    get { return line; }
  }

  public bool MoveNext() {
    if ( stream == null ) return false;
    return (line = stream.ReadLine()) != null ? true : false;
  }

  public void Reset() {
    stream.BaseStream.Position = 0;
    stream.DiscardBufferedData();
  }

  //We also implement IDisposable to clean up the StreamReader.
  public void Dispose() {
    if ( stream != null ) {
      stream.Dispose();
    }
  }
}
So foreach provides a shorthand for the following:

using ( EnumeratedFile file = new EnumeratedFile( "c:\\temp\\test.txt" ) ) {
  var line = string.Empty;
  while ( file.MoveNext() ) {
    line = (string)file.Current;
    Console.WriteLine( line );
  }
}
foreach can infer IEnumerable interface for array and dynamic types. The C# language spec has details on the behavior of foreach.

C# provides a simpler way to implement an iterator. Instead of implementing Current, MoveNext() and Reset(); we could use the yield keyword to simplify things.

Using yield, EnumeratedFile can be re-written as:

class EnumeratedFile : IEnumerable, IDisposable
{
  string line;
  StreamReader stream;

  //Constructor
  public EnumeratedFile( string path ) {
    line = string.Empty;
    if ( File.Exists( path ) ) {
      stream = new StreamReader( path );
    }
    else {
      stream = null;
    }
  }

  public IEnumerator GetEnumerator() {
    if (stream == null) yield break;
    while ( (line = stream.ReadLine()) != null ) {
      yield return line;
    }
    yield break;
  }

  //We also implement IDisposable to clean up the StreamReader.
  public void Dispose() {
    if ( stream != null ) {
      stream.Dispose();
    }
  }
}
yield does the heavy lifting of generating current/movenext and maintaining state between iterations. Raymond Chen and Jon Skeet have explained the code which is generated by the compiler. We could also look at the generated code directly using Ildasm disassembler or indirectly using a .NET decompiler. IEnumerable and IEnumerator also have generic counterparts IEnumerable< T > and IEnumerator< T >.

What is the big deal?

What is the use of iterator abstraction, if it boils down eventually to a pair of jump statements?

Two things:
1. Iteration is a very common operation while automating work with computers: When we model real world objects, create several instances of them and apply procedures repeatedly on all the instances**. Enough times that many languages provide the iterator as a built in construct or make it easy to write the construct ourselves.

2. By making iterators a part of the language, the language designers provider us users a 'conceptual framework'*** to build higher levels of abstraction. For example IEnumerable in C# lays the groundwork for transforming sequences to sequences, which is one of the underlying ideas behind the Linq framework.


* See this post for tail optimization with respect to C#

** Iteration is not the only way to apply a procedure on a group of objects. Set based operations are another way. In fact, it is a common programming error for application programmers new to databases to iterate over each object in a set, instead of operating over the sets of data, and encounter performance problems as the data set grows.

*** From SICP, on map and similar operations over lists: ...The difference between the two definitions is not that the computer is performing a different process (it isn't) but that we think about the process differently. In effect, map helps establish an abstraction barrier that isolates the implementation of procedures that transform lists from the details of how the elements of the list are extracted and combined. Like the barriers shown in figure 2.1, this abstraction gives us the flexibility to change the low-level details of how sequences are implemented, while preserving the conceptual framework of operations that transform sequences to sequences.

Tuesday, September 24, 2013

Anonymous Methods and Closures in C#

In the last two posts we looked at Action, Func, delegate and callbacks in C#. In this post we look at anonymous methods and closures.

Anonymous Methods

A method is anonymous if it is not bound to a name. A couple of ways to use anonymous methods in C#:

1. The 'new' way, using lambda expressions:

System.Func< int, int > doubler = x => x + x;
2. The 'old' way, using delegate explicitly:

delegate int DoublerDelegate( int d );
class Test
{
  public DoublerDelegate Foo()
  {
    return delegate( int x ) { return x + x; };
  }
}

//In the client, we can call the anonymous method:
Test t = new Test();
int i = (t.Foo())( 2 );
As expected, both ways give the same result.

Under the hood

Looking at the IL in the Ildasm disassembler for the 'old' way (explicitly using delegates), we see:

As expected, a DoublerDelegate which extends System.MulticastDelegate.
And inside the Test class, the compiler generated an anonymous static method corresponding to our anonymous method:

.field private static class anonymous.DoublerDelegate 
'CS$<>9__CachedAnonymousMethodDelegate1'
//...

.custom instance void 
[mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() 
= ( 01 00 00 00 )
 
.method private hidebysig static int32  
'
b__0'(int32 x) cil managed { //... Implements our anonymous doubler method } // end of method Test::'b__0'

In the DoublerDelegate instance, the static method is invoked.

IL_0009:  ldftn    int32 anonymous.Test::'b__0'(int32)

This static method gets called in main, in lieu of our anonymous method.

If we looked at the IL for the 'new' way with Func, we would see that the IL generated for the anonymous method is similar: A static method is generated and called from the Func.

Non-local variables and closure(?)

Our anonymous doubler methods only accessed the variable x. x is local to the anonymous method and not influenced by the world outside the method once it is bound to a value.
This is why the compiler can generate just a static method for the anonymous method. The method does not need external state.

Let us introduce a variable which is non-local to the anonymous method.

public DoublerDelegate Foo()
{
  int i = 10;

  return delegate( int x ) {
    i = i + x;
    return i * x; 
  };
}

i is non-local to the anonymous method, but used by it (in other words, i is a 'free variable'). Now the compiler would have to account for the variable i, whenever the anonymous method is called.

It does this by generating a new class.

.class auto ansi sealed nested private 
beforefieldinit '<>c__DisplayClass1'
       extends [mscorlib]System.Object
{
  .custom instance void 
  [mscorlib]System.Runtime.CompilerServices.CompilerGeneratedAttribute::.ctor() 
  = ( 01 00 00 00 ) 
} // end of class '<>c__DisplayClass1'
This generated class encapsulates both the non-local variable i, and the generated anonymous method which uses it.

Raymond Chen's post from years ago elegantly describes how anonymous methods are generated.

BTW, there seems to be some confusion among people who know better than me, whether the example above is a true 'lexical closure'. For example in Scheme, the closure would consist of the anonymous function and the environment which references it. In case of C#, this is simulated by generating classes and methods.

For my point of view, of just another application programmer, the above code is an example of closure in C#.

Garbage Collection is important

In the example, the variable i has to live even after the enclosing method Foo() has returned. Why? Because the anonymous method references it, and it could live on based on the client code's intentions. We saw the compiler generated an internal class which maintains the non-local variable i. But who frees the generated class? The answer is obvious - the garbage collector. The internal object would be marked for deletion whenever the referencing anonymous method is freed. While this is a trivial question in C#, it would not be so simple if the closure would have to be freed in the absence of an omnipresent GC.

Gotchas

In C# anonymous methods, ref and out parameters cannot be captured automatically. The GC essentially works on the heap, not the stack. Reference params are not moved to the heap in C#, they live on the stack and are not captured in the closure. If we do need ref parameters, we would need to make local copies ourselves. In other words we have to handle the copy-in/copy-out.

This post from one of the implementers of anonymous methods in C# explains the copy-in/out semantics. There are other posts on his blog which explain deeper nuances of anonymous methods in C#. The matter of ref and out parameters, stack and heap are also explained in detail by Eric Lippert.

In Practice

Delegates which simulate functions-as-first-class-citizens are a useful feature in C#. They allow us to better express verbs (functions) in what is essentially a kingdom of nouns (classes). This helps an application programmer in terms of expressiveness, composability and terseness.

But how useful are anonymous methods and closures to just another application programmer like me?

Anyone who has used Linq, threading or task library in C# in some depth would be familiar with this topic. Here we use an anonymous method which forms a closure over the non-local variable 'extra'.

var extra = 10;
var result = myList.Select( x => x + extra );

Consider another example. We format a string based on some inputs. We need to insert a comma into the string based on the inputs.

public string SerializeToStr( string subject, string status )
{
  StringBuilder output = new StringBuilder();
  output.Append( "{'ticket':{" );
  bool aFieldExists = false;

  Action< string, string, string > Append = ( string item, string header, string trailer ) => {
    if ( item != null ) {
      if ( aFieldExists ) {
        output.Append( ", " );
      }
      aFieldExists = true;
      output.Append( header ).Append( item ).Append( trailer );
    }
  };

  Append( subject, "'subject':'" + DateTime.Today.ToString() + " ", "'" );
  Append( status, "'status':'", "'" );
  output.Append( "}}" );

  return output.ToString();
}

This needlessly long example could as well have been implemented with a helper method and a state variable for aFieldExists. However the anonymous method does provide for better locality of reference from an application code's point of view.

Anonymous methods and closures seem to be syntactic sugar in C# from a user's point of view. However they do improve expressiveness, and can be quite addictive to use.


Thursday, September 19, 2013

C# Delegate: A closer look

C# Delegate and Multicast

In the previous post, we looked at the C# callback mechanism using delegates, and the syntactic sugar provided by Func and Action.

Can we dig a little into the C# delegate implementation and learn more?

As we know, the delegate is really a function (method) pointer. But it does more - we can assign multiple methods to a delegate. When the delegate is invoked, it calls the methods assigned to it in order.

Consider 2 logging classes which implement their respective log methods:

public class ConsoleClass
{
//...
  public void LogToConsole( string mesg )
  {
    Console.WriteLine( mesg );
  }
}

public class FileClass
{
//...
  public void LogToFile( string mesg )
  {
    string file = "syslog.txt";
    using ( TextWriter f = File.CreateText( file ) ) {
      f.WriteLine( mesg );
    }
  }
}
The log methods have the same signature. So we can assign them to one delegate and invoke it.

//Test client code
public delegate void SyslogDelegate( string mesg );

private static void TestDelegates()
{
  ConsoleClass console = new ConsoleClass();
  SyslogDelegate d = console.LogToConsole;
  FileClass file = new FileClass();
  //Add second pointer to the delegate (multicast).
  d += file.LogToFile;

  d.Invoke( LogMesg ); //Shorthand: d( mesg );
}
As expected, invoking the delegate calls both the console and file log methods in order.

What is the C# delegate?

Looking at some methods/properties the delegate class itself exposes:

int i = 1;
Console.WriteLine( "Methods which were referred to by " +
                    d.GetType().Name + ": " );
foreach ( var delegated in d.GetInvocationList() ) {
    Console.Write( i.ToString() + ". " );
    Console.WriteLine( delegated.Target.GetType().Name + 
                       "->" + delegated.Method.Name );
    i++;
}
This prints:

Methods which were referred to by SyslogDelegate:
1. ConsoleClass->LogToConsole
2. FileClass->LogToFile
We can infer from the public method GetInvocationList() of the delegate, that the delegate simply holds a list of classes which enclose the referred methods. If we look at the IL for our SyslogDelegate in the Ildasm disassembler tool:

.class public auto ansi sealed Delegates.SyslogDelegate
       extends [mscorlib]System.MulticastDelegate
{
} // end of class Delegates.SyslogDelegate
we can see the delegate keyword generates to a class which derives from System.MulticastDelegate.

Looking at the System.MulticastDelegate class in a disassembler and reading MSDN, we can gather a couple of things:
1. The System.MulticastDelegate is something the compiler uses, we cannot normally inherit from it.
2. Since the delegate keeps a reference to the method, at least a part of the class which contains the method is alive as long as the delegate is alive.

The second point implies that as long as our delegate lives, the GC will keep the referred class around. At least if the method the delegate refers to, uses other variables inside the scope the class**. We can test this easily with a test program and a memory profiler.

This has implications wherever delegates are used, especially when dealing with higher level abstractions. For example, we can forget removing long lived events which could cause the referred objects to live for a long time, leading to a memory leak.

Simulating delegate

To understand the C# delegate mechanism better, let us try to write a naive and limited mechanism using reflection.

public class MyDelegate
{
  public MyDelegate(object target, string method) {
    this.Target = target;
    this.Method = method;
  }
  public object Target { get; set; }
  public string Method { get; set; }
  public void Invoke(Object[] paramArray) {
    MethodInfo m = Target.GetType().GetMethod( Method );
    m.Invoke( Target, paramArray );
  }
}

public class MyMulticastDelegate
{
  public List< MyDelegate > InvocationList { get; set; }
  public MyMulticastDelegate()
  {
    InvocationList = new List();
  }
  public void Add( object target, string method )
  {
    InvocationList.Add( new MyDelegate( target, method ) );
  }
  public void Remove()
  {
    if ( InvocationList.Count > 0 ) {
      InvocationList.RemoveAt( InvocationList.Count - 1 );
    }
  }
  public void Invoke( object[] paramArray )
  {
    for ( int i = 0; i < InvocationList.Count(); i++ ) {
      InvocationList[i].Invoke( paramArray );
    }
  }
In the test client code:

MyMulticastDelegate m = new MyMulticastDelegate();
//Add method 'references' to our delegate
m.Add( new ConsoleClass(), "LogToConsole" );
m.Add( new FileClass(), "LogToFile" );   

//Call delegate
m.Invoke( new object[] { LogMesg } );
This would call both the methods like the implementation with real delegates.

Of course, this is a naive implementation. What the C# delegate mechanism provides is:
- The ability to invoke a method at runtime based on its signature. Note that our implementation does not handle return types in any way.
- A type safe way of invoking multicast function pointers. So we get compile time checks (and with Visual Studio as-you-type checking). This is useful, otherwise we can easily cause runtime errors like we could in a reflection based invoke mechanism.

Let us explore closures and delegates in the next post.
Source Code

Side note: Could unsafe pointers and pinning be used to implement a naive delegate like mechanism 'from scratch'?


** If it were a method with no side-effects it could in theory be generated as a static which does not really require the class instance. BTW if the method actually uses variables outside its local scope but in the enclosing class's scope, we have something similar to a closure. More on this in the next post.
Boston, MA, United States