Estimate of the market share of WiFi routers manufacturers using the MAC addresses and statistical methods used in the German Tank Problem.

Singapore has a rather small number of ISPs(Internet Service Providers). There are around 4 major ISPs and all of them provide a router with each internet connection subscribed to. In most cases, each ISP gives out routers from only one brand they have liaisoned with. Since MAC addresses are unique for each router and a defined range of MAC addresses are assigned to each router firm, it may be possible to estimate the number of routers by each manufacturer out in the wild using techniques from the German Tank Problem. Note that the mac addresses are public and are broadcast such that any wifi capable device can pick them up.

The German tank problem originates from WWII when the allies had to estimate the number of tanks produced by the Germans. What they did was to take down the serial numbers from captured tanks and estimate, using the sample of serial numbers, the number of tanks produced. This was based on the assumption that the serial numbers were in order of production:i.e. the first tank had serial number 1 and the second tank off the production line had serial number 2.

Lets say tanks with serial numbers 1,34,64,100 were captured. It will not seem reasonable that there were 1000000 tanks because the number is so far away from the maximum number in our sample. Intuitively, we can guess that there are probably around 100 tanks out there. The tanks in the middle were just not caught. There is more information on the Wikipedia page which will certainly be more useful in understanding the nitty gritty statistical methods to get a fairly good estimate on the number of tanks.(actually its fairly easy to understand)

For our task, MAC addresses can be considered the serial numbers of each router.Ranges of MAC addresses are assigned to each manufacturer. e.g company A can use the ranges 12:34:56:00:00:00 all the way to 12:34:56:FF:FF:FF. Note that the MAC addresses are encoded in base 16(hexadecimal) but essentially can be treated as numbers. In production, my guess is mac addresses are assigned to each router incrementally: 12:34:56:00:00:01 for the first router and 12:34:56:00:00:02 for the second router and so on.

In order to get a sample of MAC addresses, perhaps war driving might be an option. A simple script on a laptop or phone can be used to log all the mac addresses. Using the sample of MAC addresses captured, all we have to do is to sort them out by router firm and then use them to estimate how many routers by manufacturer are there out in the wild!


Leave a Reply

Your email address will not be published. Required fields are marked *