Archive for January, 2013

Analysis of Test batsmen – Starts and Conversions

January 30, 2013

This analysis is done by making some adjustments to raw scoring data. If you are not interested in how  these adjustments were done and the my own statistical definition of a “start”  , you can skip to  “THE PLOT” section near the bottom.

I started this exercise with a goal of quantifying some of the terms used while describing batsman, like “He is a shaky starter”, “Has excellent conversion rate” , “Looks well set”. Apart from that, I wanted to see which batsmen start well , which convert well and which do both well. Everything will be in the context of Test matches.

First thing I had to do was to define a “start”. There is no consensus on this. Commentators use the term “start” to describe anything from 20 to anything approaching 50. What might be meant by a start is that the batsman has overcome his initial nerves, the renewed enthusiasm from the fielding side after a wicket and he has more or less got the pace of the wicket. Quantitatively we can define this as the score around which the probability of him being dismissed (immediately without adding a run to his score)  is not much different from a score everyone agrees that he is set, let’s say 50 plus. So I calculated the probability of a batsman being dismissed on each score from 0 to 400. There is very little sample on 300+ scores, but our interest lies much below that mark. ( The trend till a score of 105 is given below).

Please note that all these are numbers from batsmen who batted No:7 and above.Yes I might have included a few night-watchmen and missed an odd injured hero’s innings, but that might not affect things much. This dismissal probability itself reveals many interesting things. Unsurprisingly, naught is the most dangerous score for a batsman. 8% of batsmen who take guard do not get past this. But rather to my surprise, this probability drops quite quickly and at around a score of 10 (~0.03) becomes comparable to the probabilities in the 50s (~0.027). Also, the probability doesn’t fall uniformly with the score and shows a lot of local variations.To get a proper picture I took the average probability of continuous score of 5. This shows a very slow decline till about 40, from where we start to see minor variations. Also, the probabilities around the 90s ~(0.022) are significantly lower than just before and after (~0.027). Clearly batsmen know their score and a hundred means so much to them.


Now coming back to defining a “start”. This is how I defined it. Find out the probability of dismissal around a score where anyone would agree that the batsmen has settled down . This should not be close to or just after a hundred as it clearly affects how batsmen play. So I took the average probability of dismissal between 55 and 75 which comes around 0.0265. The “start” score would be that where the probability of dismissal first reaches this and is more or less maintained for 5 continuous scores. 32 is that score.

Now we have to define a conversion. I could not define this statistically like a “start”.Traditionally 100 is considered a *big* one, and I felt the value should be somewhere between 70 and 100. I decided to use 85 which is right in the middle.

Also, we cannot blindly mark an innings of 31 as a”non-start” and 84 as “non-conversion” just because they failed to hit the cut-off by 1 run. I used a function to give credits instead. For example, 32 would be given a credit of 1, 31 would be given 0.99 and so on. Similarly 0 is given a credit of 0  and 1 is given 0.025 . To get the number of starts the batsman got, we sum up the credits for all his scores. We do a similar thing for the conversion boundary. 8 5 gets a coversion credit 0f 1 and 84 gets 0.99 and it gradually reduces and a score of 32 or below gets 0 as conversion credit. Anything above 32 is given a weight of 1 (for start) and anything above 85 given a weight if 1 for conversion. The equations used are provided in the excel attachment at the end of the post.

Another problem is how to handle not outs. What if a batsman was not out on 50. He could have scored a 100 or he could have got out next ball. For this I predict the expected score by taking the average of the scores above 50 where he got out. Likewise for all his scores. For this, only the particular batsman’s personal scores are used.
So then I go through each score of a batsman , calculate the expected score if he had remained not out and then finally assgin start credit and conversion credit as discussed above. I then caluclate the percentage of total innings in which the batman got a start and the percentage of those scores which he converted to a big

The PLOT: Percentage of Good Starts Vs Conversion Percentage

This is how the plot looks like. ( If this graph is appearing too small, click on it to expand). I have plotted only batmen with 5000+ runs.And marked some of my favorite batsmen and also the extremes. The more one goes to the right the better starter he is. The higher  he goes up the better converter he is.


The Don is truly head and shoulders above everyone else. Interestingly he is not the greatest starter. Jack Hobbs is! He gets a start 74% of the time .  Considering he the era he played with wet wickets and that he was an opener, this is truly remarkable. The next is Rohan Kanhai with 72. Two modern day openers are close behind -Hayden and Sehwag- with around 66% starts. Ian Bell seems to be the worst starter with just 49%.

Bradman has the highest conversion rate 61%. The distant second guy Barrington is at just 51%. Carl Hooper seems to be very bad at conversion with just 25%.   Lara seems to be doing better than Tendulkar on both start and conversion which is slightly surprising as people consider Lara to be slightly more suspect at the start than Tendulkar.

The raw data was obtained using cricinfo statsguru

You can view the data points for other batsmen and also the dismissal probability at each score in this excel plot