Wednesday, October 17, 2007

WE by count

Again using the Monte Carlo from the intentional walk tool and some data from baseball-reference on the league average OPS by count, I have cobbled together a WE chart by count. I am planning on using these numbers as the baseline for the intentional walk tool if the count isn't 0-0. But I thought people might be interested in this table stand alone so I have uploaded it here. Warning, it is a really huge file and probably will take several minutes to load on your computer.

A big thanks to John Walsh for pointing me to exactly the data I needed as sometimes this whole interweb is just too confusing.

Edit: Also a big thanks to tmapress and tangotiger for correcting my WPA WE error.

Tuesday, October 16, 2007

Future Leverage Index

The next thing I wanted to tackle with my Monte Carlo from yesterday was a new metric I am going to call Future Leverage Index, or FLI. Leverage index (LI) and WPA are great for telling you what the current situation is, and how important the next batter is, but what about one or two batters down the line? Many people (including me) have screamed at the TV (or monitor if you are watching with MLB.tv like me) when a crappy reliever is left in the game to face an extremely high leverage situation while the closer blow bubbles with his bubble gum in the bullpen. One of the problems managers have though is relievers need time to warm up so they have to look into a crystal ball to try to determine what the situation will be in a few batters. This is where FLI comes in.

FLI starts with the situation inputed and then runs two league average batters just like it did for the intentional walk tool. Only this time instead of weighting the results with WPA it weights the results with LI. If the half inning ended the FLI is assumed to be zero because you have plenty of time to warm up a new pitcher while you are batting. Averaging all the possible outcomes and you get the FLI for that situation. So like LI, a FLI near zero means it is unlikely that the situation in two batters will reach crisis. As the FLI goes up, the more and more likely a very important situation will occur. A FLI of anything above three probably means you should get your closer up NOW.

Here is a link to a nice table that includes WPA, LI, and FLI. A few interesting situations that I would like to point out. First, the highest leverage situation in baseball is bottom of the ninth, two outs, with the bases loaded in a tie game. This checks in with a LI of over ten but has the lowest possible FLI of zero. Why is this? Because either the batter reached base and the game is over or he made an out and the inning ended. In fact, if you look at most two out situations you will see that they tend to have very low FLIs. This is because even if the pitcher is struggling, it is likely that one of the next two batters will make an out and end the inning. This means that you really should be warming up your important arms early in the inning, not late. Also, if you have a lead and a few runners get on base, and it is getting late in the game, now is the time to warm up your closer. Top of the eighth with a two run lead and runners on first and second checks it at a FLI of 2.9. That is the same leverage as a one run lead starting the top of the ninth. If you are behind by a run though FLI drops like a rock.

Hopefully, the end product of this will be something that is combined with the intentional walk tool to become a situational tool. Plug in the situation and the players involved and it will tell you if you should walk the next batter or to start your closer warming up or whatever. That is still probably weeks away but this should be a good step in that direction.

Monday, October 15, 2007

Intentional Walk Tool

This post is going to be a nitty gritty explanation of the intentional walk tool I wrote. If you are coming from HardBallTimes and want a detailed explanation you are in the right place. If you got here elsewhere and just want a quick rundown check my HardBallTimes article here.

So the tool is actually two different programs put together. The first tool is a Monte Carlo (MC) simulation of a baseball at bat. What it does it is runs 10,000,000 trials for each at bat for every possible baseball situation. There are eight possible runner situations (e.g. runners on second and third) and three out situations (no outs, one out, and two outs) and it does this for an OPS of 400, 500, 600, 700, 800, 900 and 1,000. The OPS here stands for the expected OPS the batter would put up against this pitcher and defense. This gives us 168 possible combinations. The program then calculates the probability of the at bat ending in every other situation (e.g. the chance that a runner on second with one out before the at bat ends with a runner on third with two outs, etc). For runner advancement, error rate, and other goodies it uses 2007 data and assumes all runners are league average. It then draws a best fit line through each situation given the OPS. This slope and y-intercept are then recorded to a lookup table.

To test that the MC is performing correctly we can run one batter for each situation and compare the resulting run expectancy to the 2007 run expectancy. I would expect that the MC should be close to the real run expectancy and it does appear close. It is hard to say how close it should be and how much variation in the run expectancy matrix is variation. Anyway, here are the numbers:


This lookup table then contains all the information you need to find the chance of a runner on second advances to third with one out if the batter has an OPS of say 761 (league average this year). These lookup tables take hours to create but once they are done accessing them is incredible quick. Now we are ready for the second part which is the program that puts all of the information together.

This program, which takes as input a web form, starts with a baseball situation. It then calculates every possible new situation after one at bat and the probability of that new situation. Do this three more times and you have a resulting matrix that holds what the game looks like four batters in the future, weighted by the probability of that situation. If the inning ended the opposing offense is allowed to bat and the game tree expands again weighing all the chances that team scored zero runs, or one run, etc... Once this is done each situation is multiplied by the WPA of that situation. Sum them all up and you get the new, corrected WPA of the original situation. If the game ended along the way the winning team is assigned a 100% WPA for that situation. The WPA values used are from Dave Studeman and Jon Daly's WPA calculator. My understanding is they used 2005 data to create their charts. If you put in a league average batters it should return their WPA but because of error in the MC and the difference between 2005 and 2007 data the MC tends to be off by less than 0.5%. There are a few instances when this rises to nearly a 1% difference though.

Obviously, this MC is not perfect. Hopefully it is a step forward though. Also note that I am planning on creating a better MC by using a more accurate input than OPS. I started with OPS because it was pretty easy to program and it would be easier to use for the general public but I a more accurate one will be coming soon.

Whew. Well hopefully that makes at least a little sense. If it doesn't please feel free to comment below or email me under my profile to the right. You can access the tool either at article at THT or by following this link.

Thursday, October 4, 2007

Was Manny Corpas Doctoring The Ball?

After my first article on Hardball Times reader Tom who blogs here emailed me saying that some TBS camera caught Manny Corpas pouring Gatorade on his jersey before coming in to pitch and then kept reaching for that spot when he was on the mound. Was Corpas wetting the ball in effect throwing a spit ball? Tom wanted me to look into it so I downloaded the 13 pitches Corpas threw and ran them through my algorithms correcting for everything but home stand calibration corrections. I didn't use those corrections because only one game has been played in the series and it is hard to tell from that how what the PITCHf/x calibration really was. These corrections are very small anyway so they most likely wouldn't make a difference. Here is Corpas' player card for full details.

Anyway, I decided to overlay Corpas' regular season data as well to see if his pitches yesterday were doing anything different from the regular season. Because it is only 13 pitches be warned, we have an incredibly tiny sample here.

It does appear that Corpas was throwing his Slider a little more often than he did in the regular season but because we saw Francis do a similar thing that appears to be the Rockies game plan. As far as to whether the ball was breaking any differently it doesn't appear to be. If you look closely you might see that Corpas was getting less vertical movement then normal but overall it appears well within the range that he can pitch. We saw the Francis was trying to keep the ball down as well and it could be those were Corpas' instructions as well. His speed was pretty much right in line with his norms as well. His Fastball was a tick above 93 MPH and his Slider was a tick above 78 MPH. So if Corpas was doctoring the ball, it doesn't look like it had any effect.

Tuesday, October 2, 2007

Season Ending Data

Except for the one game playoff yesterday the player cards and the defensive metrics should be completely up to date. I have added some splits now to the player cards including right/left split and a breakdown by count. I also have added an explanation page for the player cards that should be linked off of every card. Lastly, I fixed the strike zone size as pointed out by Ike. Thanks Ike for spotting that. If you have any questions or corrections feel free to comment below.