Thursday, August 9, 2007

Team Defense

The role that team defense plays in baseball has a history of being difficult to measure. Recently, improved metrics have shed more light on what is going on. In this article I am going to discuss some of these metrics, their strengths and weaknesses, and introduce a new metric.


Traditionally, team fielding percentage was the best metric available to studying team defense. This system is based on official scorers who assigned errors to plays that should have been made by the defense. Add up all of these errors and divide by the number of opportunities and you have fielding percentage. Problems with this metric are easy to see. Each scorer will have a slightly different view of what plays should have been made. Getting only the lead runner instead of turning the double play are scored identically.


In the 1980's Bill James invented Defensive Efficiency Rating (DER), which basically tracks the number of outs a defense produced on balls in play. There are a few different ways to calculate DER so I am adding a link to Baseball Prospectus for more information. DER does away with any subjectivity and correctly credits double plays. DER however, doesn't credit an outfielder who keeps a runner from taking an extra base, or penalize a catcher who airmails a throw to second on a stolen base attempt. Lastly, no corrections are made for the type of ball in play. Line drives are harder to turn into outs than ground balls.


Zone rating attempts to correct for this by assigning a zone for each defender on every play. If the ball is put in play in a defender's zone and he doesn't field it he is penalized. Add this up for each defender and you have a team's zone rating. Hardball Times has just started to publish a Revised Zone Rating (RZR), which adjusts the zone rating for the type of ball in play. While this is another improvement, this system still has some weaknesses. Catchers are completely ignored in this system. Again, no credit is given for a defender who prevents runners from taking extra bases. Lastly, it is hard to get a handle on the effect of the metric. Just how big is a difference of a team's who has a 0.84 RZR and one with 0.80 RZR?


To help resolve some of these issues I am going to introduce Defensive Expected Runs (DExR). DExR is the sum of the differences between the expected runs before and after each play. That is, for every play I look up the expected runs for that situation before and after the play is made. Adding up this difference for all the plays a team makes produces the DExR. The only plays that are not included in this metric are plays the the defense can't contribute on. The list of plays excluded then are: strikeouts, walks, hit by pitches, balks, and home runs (though inside the park home runs do count).


Take the situation where there is one out and a runner on first base and the batter singles, allowing the runner to reach third. The expected runs (in 2006) for a runner on first with one out is 0.5675 and for runners on first and third is 1.1734 so the DExR for this play would be -0.6059. If instead the runner tries to steal second and is thrown out the expected runs for nobody on and two outs is 0.10907, so the DExR is 0.45843.


A quick note on how to think about this metric. If a team ends the year with a 80 DExR that doesn't mean the defense was responsible for saving 80 runs. What it means is that, the sum of the expected runs after every play was 80 runs less than the sum of the expected runs before the play. It is a subtle, but important, difference. This means the DExR compared to league average is more important number than a team's DExR itself. If you subtract each team's DExR from the league average DExR you get each team's DExR above league average. For short I am labeling this DALG.


There are several advantages to this metric. First, all defensive plays are counted, not just balls in play. Things like wild pitches and runner advancement matters. Second, this metric is measured in the currency of baseball; runs. The weakness of this metric is that it doesn't correct for the type of ball in play on balls put in play. A solution to this would be to assign zones like RZR and produce an expected runs matrix for each situation and each type of ball in every zone. Unfortunately, that is currently beyond the scope of my code.


Here is a link for the 2006 and 2007 DExR numbers. This post has already gone long so I will save my comments on the data for another post. Ideally, I would like to have DExR automatically updated every morning. I am running into some difficulties though with my host so I am still going to have to update by hand. Hopefully in the next few days that will get resolved.


0 Comments:

Post a Comment

<< Home