Friday, August 10, 2007

To walk or not to walk

That is the question. In the NL you often have situations like a runner on second with two outs and the eighth place batter come up. You can either pitch to him and hope you can retire him and have the pitcher's spot leading off the next inning or walk him and face the pitcher right now.

For someone relatively new to baseball it may appear that most managers act alike. They put out a lineup and sit in the dugout not really doing anything until it is time to make a pitching change. But when you look closer you see that some managers actually are wildly different. Bobby Cox, for instance, loves the intentionally walk. His protege Ned Yost almost never uses it. Some managers love getting a lefty/lefty or righty/righty matchup and will constantly change relievers to get the matchup he desires. Tony LaRussa is the obvious one who comes to mind here.

So which of managers are doing the best job? This is a very hard question to answer. The reason for this is situations like the one above. Ask several "Baseball Guys" and you will get different answers. Also, how well the eighth placed batter and the pitcher hit will change the outcome. Because these situations are very complicated not a lot of statistical work has been done on how manager's strategy effect their clubs.

One of the things I would like to do with this blog is start looking at these decisions and shed some light on which really is the right one. Then, examine the data and determine which managers are making those decisions. We will start with the decision whether or not to walk the eighth place batter.

We need to start by making some assumptions to make this task a bit easier. First, we will assume that the batter is an average NL eighth placed batter and the pitcher is an average hitting pitcher (in the next post we will remove these assumptions). Second, the rest of the order is league average to allow us the use of the runs matrix we used in our previous study on defense. 2006 numbers will be used because they are static. Lastly, the pitcher involved is also league average and he has a league average defense behind him.

To solve this problem we would like to calculate first the expected runs of letting the eighth placed batter bat and then the pitcher bat (either now if he reaches or to start the next inning). We can't use the runs matrix here because the eighth placed batter and the pitcher behind him aren't league average. So we could calculate every possibility and then add up the expected runs from all of them.

The problem with this is with just two batters involved calculating every possible outcome is a monumental task. For example the eighth placed batter could single and the runner on second might go to third or he might score. That might be followed by the pitcher who unexpectedly doubles and now the new runner on first might try to score and he might be gunned down at the plate. Whew. that is just going to be too difficult.

The solution to our problem is to write a Monte Carlo or MC. A MC takes a difficult sum (or integral) and breaks it up into smaller, manageable parts. A random number is then thrown to determine which part is chosen and a value is determined. Do that about a million times and average the result and you should have a very good approximation of the thing you are looking for.

So I setup a MC to tackle this problem. As of right now it doesn't have stolen bases or caught stealing or wild pitches included but it does have pretty much everything else in there. Currently, I feed it a players OPS and it calculates about how often that player should hit singles, doubles, triples etc. It runs through two batters once pitching to both and once walking the first. Because a new inning could occur it calculates the expected runs from both the current inning and the next inning that we might be in the middle of.

The first thing to do is to test it with a known value to determine if it is working properly. That is where the run matrix comes in. If I put in two league average players I should almost exactly match what the run matrix says. Here are the results:


The first two columns are the league average OPS for each batter for 2006. The second is the situation, so -2- means runner on second. E_runs is the expected runs from the runs matrix. MC is the results from the Monte Carlo. As you can see the MC does a very good job of reproducing the runs matrix. Better than I had hoped for actually. This means that when I change the OPS for each of the batters I can have confidence that the results that I am seeing are correct. Lastly, from running this several times I found that a million trials was good enough only to the thousands decimal place so from here on out that is all I am going to report. People do a disservice publishing more decimal places than the data is accurate to but that is another story. Anyway, now we are ready for the full results:



So for a league average eighth place batter and a league average pitcher up next the actual expected runs go way down from the run matrix. This is no surprise. According to the MC the intentional walk is slightly worse if there is just one runner on base but actually is significantly better if there is a runner on second and third.

This is not what I expected but an interesting result. In the next article we will remove the constraint of using the league average OPS for our players and come up with a generalized solution.

ps. I have changed the tables to screenshots of the data. I really would like to get the tables working so if there is another out there with experience at putting tables into blogger please let me know.

0 Comments:

Post a Comment

<< Home