Friday, August 31, 2007

More fun with pitchFX

As promised here is the beginning of five players in five days. The only problem is I only have one request for a player so far. So that will take care of today but I need some more requests people. There has to be a player that you find interesting either a pitcher or a position player. Please add your requests to the bottom of this thread and tomorrow you will see some interesting plots on them. Ok with that on to Ryan Braun.

This plot should look pretty familiar. It is Ryan Braun's hit chart just like what I made for Fielder yesterday. The thing that really gripped me about this plot was the lack of hits up and away so I added some dividing lines to break the strikezone into quarters. It seems pretty clear to me that Braun is a low ball hitter with his happy zone coming low and in (remember he is a right handed batter so he is standing around -2). That got me wondering if Braun was swinging and missing at a lot of those high balls. So I made this plot.

This plot shows ever pitch that hit the catcher's glove when Braun was up to bat. You can see that there seems to be some extra called strikes that are out of the zone in and some balls that appear to be way too good not to be called strikes. This seems to be about what was found in this study. Anyway, back to Braun, it appears that he is most willing to expand his zone on balls off the plate and down a bit. Since he appears to be such a good low ball hitter this seems to make sense. The Cubs seemed to be giving him a steady dose of low and away with the off speed pitches and fastballs up (and somewhat in). It will be interesting to see if other teams start trying that. These plots seem to suggest that is where his weaknesses are.

That got me thinking that it would be interesting to compare Braun's chart to a known free swinger. Here is Geoff Jenkins' strike chart.

Jenkins really can't lay off the ball in the dirt. It appears that almost half of the balls thrown over the plate but down he swings at. Also notice how few balls in pitchers have thrown to Jenkins. The book on him clearly is away, away, away and down when you get two strikes. What about Fielder?

Same sort of plan by the pitchers. Keep the ball away from Fielder and low and away if you want him to swing. Here you can really see that the umpires are calling a lot of strikes on Fielder on balls that are off the plate. Hopefully, as he gets older he will start getting that called correctly. Unlike Braun, very few called balls are in Fielder's zone and most of them are inside. If you look very closely to this plot I have added two black plus signs (clicking on the plot will make it bigger and much easier to see). These are two pitches are called strikes from the game that Fielder got ejected by Wally Bell. Unfortunately, these aren't the pitches from his at bat in the 8th inning they are from earlier in the game. Why am I not showing you the at bat Fielder got ejected over? Because MLB has removed the pitchFX data for that at bat. Here is a link to the inning and you can see that every other at bat has pitchFX data but Fielders. That makes me think that the pitchFX data showed that the call was terrible and that Fielder was correct when he argued. We know Bell had called two strikes on balls that were clearly off the plate earlier in the game and it seems very possible that the "strike" that he rung Fielder up on was even further outside.

Thursday, August 30, 2007

Fielder hit chart

So the Sheets pitch stuff went over pretty well how about some stuff on the guys who hit the ball. Here is the location of every hit recorded by pitchFX that Prince Fielder has hit. I have color coded the m to make it easier to see. Again for his strike zone I am averaging the top and bottom from all of his plate appearances. Because pitchFX has some issues going from park to park you should not take these locations as set in stone. I will be normalizing them later, hopefully by the weekend. Sadly, pitchFX was only on for ten of Fielder's league leading 39 homers. Because Fielder is a lefty he is standing around +2 horizontally. Most lefties' happy zone is on balls down and in but Fielder seems to like to get his arms extended and seems to be better middle out than middle in. When you pitch him on the outside corner he still seems to get to that ball but not hit them with as much power as the others.

I'm going to post at least one plot for a player every day for the next five days at least. Please let me know who you would like to see plots on in the comments below. One entry per person per day please and I will post them in the order they were received. If the player doesn't have very many ABs or pitches that were tracked I reserve the right to skip that player.

Sheets returns from the DL

Ben Sheets came off the DL yesterday and pitched a gem of a game going 6 innings and only allowing one run. How did he do it you ask? Well I have been messing around downloading the pitchFX data that MLB has been providing and lucky for us it was in use at Wrigley yesterday. I am planning n doing some hard core analysis with the data but for now I am content in making some pretty pictures. So why don't we start with Sheets' pitch speed for the game.

So it looks to me like out of the 85 pitches Ben threw the vast majority of them were fastballs. The Wrigley field radar gun that was in use during the game seemed to show him a tick faster than this but there is a lot of variation in those guns and this snapshot is a few feet away from where the ball actually left his hand. Anyway, for now I am going to place an arbitrary cutoff at 85 MPH and say the ten pitches below that were curves and the 75 above that were fastballs. Now lets look at where those pitches crossed the plate.

This is a very busy graph so let me take a minute to clear things up. First, all numbers are in feet here and negative x is going toward a right handed batter. So basically you are looking at it like the catcher would be. The red marks are fastballs and the blue marks are curves. Squares are pitches that were called balls. Circles are strikes with filled in circles strikes swinging. Plus signs are foul balls or foul tips. Filled in triangles are balls in play. I have added a strike zone box in black using the correct width of the strike zone according to MLB and I have averaged the height of all of the Cubs batters for the vertical zone. The first thing that jumps out at me is what an excellent strike zone Gerry Davis had. This isn't too surprising since Davis is a crew chief and has been an umpire for 31 years!

Anyway, back to Sheets. Look at the ten curve balls he threw during the game. All but three of them were up in the zone and six of them were in the strike zone. It appears that he was using that pitch as a "get me over" pitch and something to keep the Cub batters thinking about. With his fastball he basically was pounding the zone staying somewhat around the corners. If the strike zone had been slightly larger up or out Sheets could have had an even better day. Surprisingly, the Cub batters seemed to do a pretty good job of laying off those pitches. For a team that walks as little as the Cubs do that is some good work. Ok we know where the pitches ended up but how did they get there?


Here the graph is in inches not feet but negative x still points to a right handed batter. What this represents is how the ball moved up/down and left/right compared to a pitch thrown without spin. So, a negative number on the vertical axis means the ball dropped lower than a ball thrown without spin. Now we can really see Sheets curve balls. He throws almost a 12 to 6 curve which means it only moves slightly away from right handers and down. His fastball is positive here which doesn't mean it actually rose it means it curved less than a ball without spin. This is very typical of a four seem fastball that Sheets throws. You can see how the ball appears to rise and bores in to a right handed batter. This plot definitely shows that Sheets didn't throw his changeup once during the game and got away with just two pitches. If you are going to do that as a starters you had better make them good and Sheets definitely had his fastball working last night.

Friday, August 24, 2007

One powerful night

I know I haven't put any new material up in the past few days and I feel really bad about that. I am struggling getting the numbers I need for the intentional walk Monte Carlo and I needed a break. So last night with the Brewers enjoying an off day and Charleston a mere two hours away, my wife and I went to see the West Virginia Power play the Lexington Legends. Lucky for us West Virginia is loaded with prospects right now after the Brewers recently promoted their number one pick from this year, Matt LaPorta, to low A ball. With their first round pick from last year Jeremy Jeffress pitching that night and some other C+ guys on the team it was going to be a good show. So without further ado, here is the treat I had been clamoring about for a while as I take off my stat-head hat and put on a scout hat for a night and give you a rundown of the game.

First a couple of quick notes. The new stadium in Charleston is very impressive. It is downtown in a warehouse district and they decided to go with that theme for the park. Around 4,400 people were there for cheap ticket night where our second row seats behind home plate cost us $3.00 each. The stadium was pretty full so I would guess that it couldn't hold too many more than 5,500 people. Being downtown, it is placed in one city block so the fences had to fit into that. It is a mere 320 ft. down each line but 410 to center. About a 12 foot wall goes around the whole yard. Charleston is right in the middle of the Appalachians so I am sure it plays as a hitters park. The only two complaints I had was the LCD screen in center had a large portion on the top that was out. I know those things are very expensive but it really was annoying to look at. Secondly, nobody was manning the in house radar gun so we didn't get any pitch speeds. I had created a specialized score card to track location and speeds of each pitch so I was pretty disappointed. For some background on the players here is John Sickels' Brewers review from before the season. Ok, on to the players.

I had been following Jeffress for a while in the box scores and knew the scouting reporting on him; great fastball, terrible command. When he walked out to the hill I told my wife that he sure looks small. Indeed, he is generously listed at 6'1" and 185 lbs. While not tall my wife pointed out that he is all legs something that I am sure has helped his pitching career. It looks like he might have some room to fill out having just turned 20. Anyway, Jeffress didn't disappoint the scouting report with a eight strikeout, four walk, no hit game for four innings. I'm going to go over some of the highlights.

Jeffress started out with a four strikeout first thanks to a strikeout on a wild pitch to the second batter. He followed that up with a wild pitch and a passed ball that could have easily been scored another wild pitch to the next batter. After loosing him putting runners on first and third he bared down and the next two batters to strikeout including Koby Clemens looking. After a quick second inning we move to the third where all hell broke loose.

Jeffress walks the ninth place batter to start the inning and then the leadoff hitter bunts. The bunt was not particularly good as Jeffress quickly fielded it and looked to second. It would have been a close play had he decided to throw there but he instead went to first and made an off balanced throw. The throw lead the firstbaseman into the baseline and he dropped the ball for an error. While he should have caught it, Jeffress had all day to throw to first once he decided to go there so it would have been nice to see him set himself and make a better throw. This appeared to have flustered him on the mound as he went 3-0 and eventually walked the next batter to load the bases. He got things together and struck out the next batter for the first out but promptly walked the next guy on four pitches to walk in a run. Clemens was up again and he hit the only grounder of the night to the thirdbaseman who stepped on the base and threw to first just late (or so says the ump) to turn the double play and another run scores. Finally, Jeffress puts an end to the madness as he absolutely blew away the next batter to end the inning. Both runs ended up unearned.

He was at 75 pitches and I commented that he was on a pretty strict pitch count and I wouldn't be surprised if that was it for him but he came out for the 4th and made quick work of the 7-8-9 guys to end his day at 84 pitches.

So I have seen most of the Brewer's top line guys come up through the minors over the past few years and honestly, Jeffress has the best pure stuff of any of them. He has an incredibly smooth delivery and really good mechanics. An Astros scout was a few rows behind us and he told me that Jeffress was sitting in the 94 range which seemed about right. When he needs to though he can dial it up with a four seem fastball. At this level that is an absolutely devastating pitch and it produced nothing but missed and weak foul balls from the Legends. My wife commented on the "rising" action of the pitch and we got a good look at one which looked like a "hit the bull" pitch that was coming right at our faces before hitting the net.

Jeffress' second best pitch, at least tonight, was his curveball. It isn't quite a 12 to 6 but very close and boy does it have a lot of late movement on it. Five of his eight strikeouts came on this pitch with two looking. It absolutely buckled knees and, after the four seem fastball, was an even more effective pitch. When he missed with it he was missing low in the zone which certainly was a good thing.

His also threw a slider to right handed batters that didn't seem to do too much. It looked like he abandoned it after the second inning as he was consistently missing with it. He also showed a change to a few lefties in the lineup but that looks to be a work in progress at best. Both that I saw were off the plate outside and it didn't look like he had any intention of throwing that over the plate.

Obviously, his overall command was erratic, at best, but I was particularly disappointed with his command of the fastball. One of the two wild pitches for the night and the past ball were on fastballs. The "hit the bull" pitch and two more that bounced feet in front of home plate were fastballs. It wasn't like he was over throwing either he just couldn't seem to locate it. As I said before his pure stuff is better than Gallardo's or Parra's pre-injury but he seems miles away from being an effective pitcher. He ended the night with just 41 strikes in 84 pitches.

LaPorta spent the night DH-ing which was also disappointing to me. I really want to see for myself how he handled LF. From all reports he needs the work out there so I am not really sure why he was DH-ing? Maybe that hamstring that was bothering him when he signed was acting up again. Anway, here is a report on his night at the plate.

LaPorta came up in the first inning with a runner on third and one out. The opposing pitcher looked like he wanted nothing to do with him and promptly walked him on four straight pitches. None were particularly close so it is hard to credit LaPorta's plate discipline there. That at bat did set the stage for how the Legends were going to pitch to him though. Fastballs in on the hands and offspeed stuff down and away. This was the third game in the series so I am assuming that might be the book on him (for the Legends at least).

In his second plate appearance he again took a fastball in, then fouled off another fastball in. The next two were offspeed and the second one got a little too much plate and LaPorta swung at it but weakly popped it to third. In his next at bat in the forth he looked like he made an adjustment and took a breaking ball that was up but outside to the gap in right center. Very good piece of hitting but the right fielder made a very nice running catch on it right next to the warning track (more on this play later). Still it was nice to see him go the other way with power. Sadly, the game was going incredibly slowly and with a two hour drive back we had to leave early and missed his last at bat which resulted in his only hit of the night.

The star offensively for the night was Steve Chapman who set a new team record blasting his 21st and 22nd home runs of the year. The first was out of the park and landed on the street so it was not a cheap homer at all. Chapman is a fringe prospect who has already struck out 130 times on the year but certainly has a lot of power in his bat. He struck out swinging on his first two at bats of the night.

Brent Brewer is a toolsy short stop that Sickels gave a C+ to before the year started. He committed his 44th error already of the season on a throw and had two missteps on the bases. In the first he doubled down the left field line and later tried to score on a flyball. He launched himself in the air just as he got to the batters box and tumbled over home plate rolling onto the grass. The catcher calmly walked over and tagged him out. If he either stands or slides he is safe and he really looked awkward on that play. Then in the forth he again doubled on a grounder down the same line. The next man up hit a warning track shot to center but Brewer was caught napping near third and he didn't tag up on the play. If that ball is dropped or goes off the wall he is scoring easily from second so that really was a bad base running error. He made up for it on the next play though as LaPorta hit his opposite field shot I described earlier and he turned it into a sac fly by tagging up and scoring from second on the play. Brewer has 39 stolen bases on the year and has ten homers to go along with 24 doubles and seven triples so he is a nice combination of power and speed. But the 44 errors and 159(!) strikeouts in 490 at bats shows just how raw he is.

For the Legends there were few prospects of note. Nick Moresi was batting ninth and scored a few runs without a base hit. He also made that nice catch at the wall on the play where Brewer didn't tag up and he was credited with an outfield assist when Brewer got thrown out at the plate. It was a pretty nice throw actually and very accurate though Brewer should have scored. He only has a .701 OPS though for the year and is old for the league so his future doesn't look so bright. Maybe he can make it as a 5th outfielder defensive replacement type.

Koby Clemens was playing third for the Legends and he is somewhat interesting. He couldn't make it behind the plate defensively and looked like he had slow reactions at third which isn't a good sign. You could tell he grew up with the game though as he was fundamentally very sound. He possesses great plate discipline and he was the only Legend who looked like he had a chance of hitting Jeffress. He pulled the only ball of the night the grounder to third and busted his ass down the line to beat out the double play and let what was then the lead run score. He hit two home runs the night before and it looked like Jeffress might be working him carefully (though with Jeffress' lack of command it is hard to tell). The strikeout looking in the first was a border line call that I actually thought was outside. It was probably too close to take but it was a nice bender from Jeffress and it was unlikely he was going to do much with that pitch. Clemens is probably not going to ever make it to the show as he appears to be a rarer bread of a great skills no tools kind of guy. His OBP is almost 100 points higher than his .253 batting average but he only has thirteen homers on the year and is slugging .407.

Lastly, the Legends starter Doug Arguello had to leave the game after a getting drilled by a line drive off the bat of Power catcher Andy Bouchie. It wasn't quite as scary as seeing Rick Helling impaled by a bat a few years ago but he was down for a few minutes. Best wishes on a speedy return Doug.

So that is a wrap. Hopefully I will make it to a few more minor league games this year though probably not any more Power games as the drive is long and through the mountains. Let me know what you think and if you have any suggestions to help scout the next time let me know.

Monday, August 20, 2007

The best laid plans

Well I was to have a massive update this weekend of the site but computer trouble and then code trouble did me in. The intentional walk data has some bugs in the code. The two out numbers look good but not any more useful than the plots I have already posted. The one out numbers look ok but the no out numbers are completely screwed up. This is almost certainly due to things like base running advancement and double play rates which I had just taken from some quick data checking. This will have to be redone and then rerun which puts that on hold for a while. It also appears that my parser doesn't like the 2005 or 2004 box scores so those DALG numbers also didn't get updated. Hopefully that will be a quicker fix. Fortunately, the 2007 DLAG updated did work and you can see those numbers on the My Statistics link. Hopefully I have some better news tomorrow as the Brewers are on the west coast which means a late night for me anyways.

Saturday, August 18, 2007

Home Cookin'

For as long as baseball has been played home teams have played better than road teams. Part of this is batting last in the inning but also people have guessed that sleeping in your own bed, playing in front of a cheering crowd, and familiarity with your home ballpark has played a roll. Up until recently it has been difficult to separate these effects from one another. Just how much of home field advantage is the knowledge of your home park for instance?

For instance, maybe your home park has a slightly tilted center field.

Maybe you have to deal with a giant wall that balls keep bouncing off of.

Maybe when you look up to catch a flyball you see this.


In 2001 MLB moved to an unbalanced schedule. Before, every team in your league came to your park the same number of times (though because of some two and four game series not exactly the same number of games). In 2001 and beyond teams starting playing many more games against other teams in their division. Now, most teams from the other divisions in your league would only travel to your park once a year, and teams in your division would play in your park for three series. This has created a group of teams that should be very familiar with your home park and a group of teams that should be unfamiliar. This study will check if there is any difference between how well home teams play against those two groups.

For each year from 1996 I broke up each team's home schedule up into three groups. The first group was games against teams in the same division. The second group was games against teams in other divisions, but in the same league. The third group was interleague games. I then estimated how many games each team should win for each group by adjusting for the strength of schedule for each game. Here are the results for 2006 by team.



div, lg, and i are the three groups: division games, league games but not divisional, and interleague games. E(d), E(l), and E(i) are the estimated wins for those groups. Note because a team usually has about a 53% chance of winning each home game these estimates are not integers.

If we look at the Cubs as an example, they play in a six team division so they are playing a huge number of divisional games. Even though they had the second worst record in baseball, they still were estimated to win about 20 home games in the division (thanks to a terrible NL central as well). When we look at league games and interleague games then they don't fair as well.

So now we are going to add those up for each team to get yearly numbers.


You can see that before 2001 teams were playing many more home games against other teams not in their division. After 2001 the home games were split rather evenly. Interleague play started in 1997. You would have expected that teams would have done better at the start of interleague play but that doesn't seem to be the case. So now to determine just how much the home field advantage changes.

I am using years 1996-2000 as the control sample where teams should play just as well out of division as they do in the division for league games. From years 2002-2007 teams should play better out of their division than inside it if the opponents are having trouble adjusting to the home park. You will notice that I am not including 2001 in either sample. 2001 was the first year of the unbalanced schedule so the previous year teams still saw the home team's park two timws so I don't feel right about adding it to the unbalanced years. On the other hand, teams were only making one trip to a non-divisional park so I don't feel right about adding it to the balanced years. So I am leaving that year out altogether. Fortunately, it doesn't really change the results either way.



Again, the first column is the divisional games, the second is League games, and the last is interleague. It appears that home field advantage has increased in recent years but the difference between the league games and the divisional games has stayed the same. This means that road teams aren't really affected by the playing in an unfamiliar park. This goes to show just how well teams prepare for games. The interleague games show the highest difference but the change in DH rules probably is most of the effect.

Friday, August 17, 2007

Some notes on the Blog

So I have been blogging for about a week and I am pretty happy with the results. I am really happy with the defensive statistic stuff and intentional walk plots (BTW, the full data for that is still coming it has just taken much longer than I thought. It does look like it should finish sometime tomorrow so I will have a full post on that this weekend). The problem is that most of the posts have been pretty dry. I know I am a math geek and when I run a new analysis I get excited about it and want to share every last detail with the world. What that has caused is a rather boring blog to read.

From now on I am going to try to hide most of the work and just post the results. Hopefully, with very little explanation. When something does need to be explained futher I will try to either hide it in a separate post or maybe under a link to click or something like that. I am also going to try to be a bit more entertaining in the posts and try to have more links to the outside world. If you have any thoughts about how just how much stuff you want to see in these posts please comment below.

As for the schedule I am going to try to get a post up about home field advantage tonight. If that doesn't happen then I will do that on Saturday and the intentional walk stuff on Sunday. Also, I will be updating the 2007 DALG statistics this weekend (it is hard to update while the walk stuff is running) and uploading 2004 and 2005 DALG numbers. I potentially could post DALG numbers going back a decade at least but I am not sure if there is interest in that. If you are interested in seeing some numbers for a particular year let me know, again in the comments below.

Thursday, August 16, 2007

Brewer's pitchers batting

After a dismal season at the plate by the Brewer's pitchers Ned decided that only the starting pitchers would take batting practice and they would do so every day. We are now in the home stretch of the season so we can take a look to see if this move by Ned has payed off. Last year the Brewer's pitchers hit an abysmal .097/.119/.113 (that is batting average/on base percentage/slugging percentage in case you were wondering) dead last in the NL. This year the pitchers have picked it up a bit to the tune of .141/.164/.177, good for 11th in the league. Now some blogs would leave it at that and call Ned a genius. Well not this one (and apparently not this one either). You see last years team had the hack-tastic Doug Davis at the plate whiffing away. He has been replaced by a respectable hitting pitcher in Jeff Suppan. Also, Yovani Gallardo, who is an excellent hitter, has been added to the staff. So, how much credit should this extra BP really get?

For that we will look at the hold overs of the rotation and see how they have preformed at the plate. We will look at their stats for this year and compare them to last years stats and their career numbers. Obviously a small sample size warning on all of these numbers. First up is the ace of the staff, Ben Sheets.

Sheets
2007 .079/.079/.079
2006 .030/.059/.030
career .078/.117/.081

Sheets has been a terrible hitter from the word go. In his seven years in the big leagues Sheets has only hit above .100 one time in his career. He has one extra base hit (a double) in 359 major league at bats. 2005 and 2006 were beyond bad though as he put up OPSs of .065 and .089! Seriously, I probably could do better than that given the amount of practice time he had. This year he has picked it up a bit but still is slightly below his career numbers.

Bush
2007 .135/.158/.189
2006 .177/.190/.226
career .158/.175/.208

Bush's career numbers are a bit misleading because he only had two at bats before coming to Milwaukee last year. His numbers are down a bit across the board and his .135/.158/.189 line is a bit surprising since he was a college catcher before moving to the mound.

Capuano
2007 .242/.286/.333
2006 .118/.143/.132/
career .162/.192/.205

Cappy has really shined this year at the plate (though it hardly makes up for his poor year pitching). A pretty mediocre hitter during his five year career he is hitting for average and some power with a home run already this year.

So one pitcher has been better than last year but below his career numbers, one pitcher has been worse than last year, and one pitcher is having a career year. It looks like we should take a look at the two newcomers, Jeff Suppan and Claudio Vargas to see how they are doing before we rush to any judgments about the success of the extra BP for the pitchers.

Suppan
2007 .091/.130/.091
2006 .218/.295/.236
career .186/.230/.208

Suppan has been a decent hitting pitcher for his long career but has really had a tough year at the plate this year. He is well off his solid 2006 campaign and also below his career numbers. Maybe the magic of Dave Duncan is wearing off.

Vargas
2007 .114/.114/.143
2006 .098/.098/.157
career .080/.096/.103

Claudio has been slightly better than last year and an upgrade over his career numbers. He has spent all five of his years in the NL so he should be used to batting by now.

So it seems really hard to tell looking at these numbers what effect the extra BP has had. Getting rid of Doug Davis and his .046/.075/.046 line seems to account for most of the increases in the 2007 pitcher's line. Adding Gallardo's .217/.250/.261 line, in limited at bats, has helped as well. Gallardo hit very well in the minors so you can expect that to continue. Ned should probably look to him first if he needs to have a pitcher pinch hit. Even though there isn't a clear result from the extra BP it does seem to make sense just to have your starting pitchers taking all the BP time because they are getting the lions share of the at bats.

Wednesday, August 15, 2007

The 2007 Washington Nationals

The Nations were supposed to be just a terrible team this year. Pundits were predicting 100 losses for sure. Some suggested they might lose 110 and have a shot at losing 120. Yet here we are on August 15 and the Nationals are 54-65 after losing a heart breaker to the Phillies last night. The Royals, Rangers, Reds, Giants, Pirates and Devil Rays all have worse records. They have the same record as the Astros, who some thought had a chance to contend in the weak NL central, and they are a mere two games behind the Marlins and threaten to get out of the cellar in the NL East.

What's going on here? How are the Nationals doing it? Well they are dead last in the NL in runs so that isn't it. They are a surprising 8th in the league in runs allowed. Their park is certainly helping these numbers but many thought this might be the worst starting rotation in the last 20 years so the pitching has been better than advertised. Both the starters and the bullpen have been the 8th best at preventing runs so far.

So the Nationals have scored 465 runs and allowed 556 runs. Their Pythagorean record says they should be 49-70 which would put them on pace for a 95 loss season. David Gassko wrote an excellent column on why teams outperform (of underperform) their Pythagorean record. He found that balanced teams, who don't rely on the long ball, with a strong bullpen, and a seasoned manager tend to outperform their Pythagorean record.

Well the Nats have a somewhat balanced offense though it is pretty terrible. They don't hit too many home runs. They have a first time manager and an ok bullpen. Doesn't seem like a team who should be almost five games ahead of their estimated record to me. Either The Nats are getting a bit lucky or Manny Acta is one hell of a manager. I'm not ready to throw out the second option yet but my guess would be that some luck has played a part in the Nats success so far.


But hold on. Even if they were playing at their Pythagorean record that would still only put them on pace for a 95 loss season. This is still much better than some people predicted. What are the Nationals doing right? Well we know it isn't on the offensive end and the pitching staff is supposed to be junk so what about the fielding? Well we could look at their fielding percentage and see they are 10th in the NL. We could dig a bit more and see they are preforming a bit better for their team zone rating. They even look respectable from a defensive efficiency standpoint. But if we look at their DALG they look downright awesome (if you are new to the blog and want to learn more about DALG you check out this post).

DALG puts them second in all of baseball at preventing runs. Only the Mets are better and the Nationals are creeping up on them in the past few weeks. So why does DALG like the Nats so much? Well first their catchers have done a great job behind the plate. As a team they are second in the NL in throwing out potential base stealers. Brian Schneider, with a little help from Jesus Flores, have thrown out 34% of base stealers, just a tick under the Cardinals 35%. This doesn't show up in any of the other defensive metrics but plays a sizable role in DALG. Also, their outfield has done a great job at getting to balls (their outfield RZR is tied for second in the league again behind those pesky Mets). Getting to balls in the outfield is more important than getting to balls in the infield because flyballs are much more likely to go for extra bases. DALG , unlike the other metrics, gives bonus points for tracking down a potential double in the gap. Speaking of extra bases. Teams have done a very poor job of taking extra bases against the Nationals outfield. So Ryan Church, Ryan Langerhans, Austin Kearns, Nook Logan, and Robert Fick take a bow.

Not only have the Nationals played so well on defense this year they have had a remarkable turnaround from last year when they were in the bottom third of the league in DALG. The difference is almost 90 runs so far. Add in those 90 runs to the 2007 Nationals and you have a team with a Pythagorean record of 55 and 107 projected over the whole season. Even their luck, or really good managing, would probably not save them from being a hundred loss team like many predicted.

So thank your defense next time you go out to start National pitchers, because they are the ones who are keeping your head almost above water.

Monday, August 13, 2007

Plunk... take your base

"The Duke led the American League this year in saves, ERA, and hit batsmen. This guy once threw at his own kid at a father-son game." --Harry Doyle (a.k.a. Bob Uecker)


"Our best hitter gets hit in the face, and who knows how long he's going to be out? You've got to do something to protect your hitters," --Jim Palmer.


The beanball has been a part of baseball for as long as the game has been played. Many accusations have been throw out over the years about certain pitchers throwing at opposing batters. The most common of these are after one of your team's batter has been hit by a pitch. Retaliation, or protection according to Palmer, can happen later that game or the follow day or some other game in the future. Because it can be hard to tell if a ball slipped out of a pitcher's hand or not it can be difficult to say just how often retaliation happens.

While I wait on the new and improve intentional walk data to finish I thought I would do a quick study on retaliation. Armed with game logs from Retrosheet I wrote some quick code to look for games where both teams hit a batter. Now, we won't be able to tell which of these games had a pitcher intentionally beaning a batter but we will be able to determine how many games that took place. For all the gory math details read on.

We will define a revenge plunking as any time a pitcher hits an opposing batter after one of his batters has already been hit by a pitch that same game. A simplified version of this is looking for games where both teams had a player getting hit by a pitch as one of them had to be a revenge plunking. The problem will be trying to separate the actually intentional beanings from an accidental hit by pitch.

First, we need to determine just how often a batter get hit by a pitch, intentional or otherwise. I am starting with data from 1970 to 2006 because before 1970 retrosheet is missing a lot of games and we don't want to bias our sample. Because this might change over time we will just count up the total number of HBPs and divide by the total plate appearances that year. That will give us the HBP rate for that year. Here are the results below in a tidy graphical format:

So we can see from the graph that the number of batters being hit by pitches steadily decreased from 1970 to 1984 . There was a small increase until 1989 then beanballs skyrocketed up to the level we see today, which is about 1.1%. With this we can now calculate what the percent chance a game goes by without a batter from one team getting hit. That would be the probability one batter doesn't get hit raised to the power of the number of batters that game for one team.

For example, if we take our HBP rate to be 1% and there were 40 batters for a team in a game then the chances nobody got hit was 0.99^40 or 66.9%. The chances that at least one player on that team got hit then is one minus that number or 33.1%. If the opposing team also had 40 batters come to the plate they too had a 33.1% chance that one of their batter got hit. If we multiply these two we get the chances that both teams had a player hit for that game. That works out to about 11%! Just random chance then says that around one game in ten should have a beanball that could represent retaliation. This is part of the reason there are so many claims that a pitcher is throwing at us.

The other part is some pitchers ARE indeed throwing at the batters during a certain game. It isn't paranoia if they really are out to get you. So how often does that happen? Well we can carry through our example using the correct HBP rate for each year and calculate how many games we would expect both teams to have a batter hit in for that year. This is the background. We then can go through the retrosheet game logs and find how often at least one batter from both teams got hit. The difference between these two numbers represents pitchers actually intending on beaning batters during a game. Here is the plot for since 1970.

Because there are more teams in the league and because the rate of batters being hit is up, the number of games with a revenge plunking has gone up through the years. As you can see, the actual number of games with a revenge plunking generally is a little higher than the predicted background. This isn't the case in all years though. That doesn't mean that there were no intentional beanings that year it just means that there were fewer revenge plunkings than expected. When you are dealing with a large data set and only a small increase over background it is normal to have some negative years. Here then is a graph where I subtracted the expected background from the data. This plot will represent the actual number of games where an intentional beaning in retaliation took place.

So for most years there are a few, but not many, games where pitchers are intentionally throwing at a batter. Again, this method can't tell which games those are, or which pitchers are responsible, only the number of games above background. The average number of these games is just about four per year. So even though the HBP rate is up and the number of total games is higher than in the old days, the number of games where a pitcher threw at a batter in retaliation has remained the same. So it does appear that the beanball is becoming somewhat of a lost art.

Lastly, I'm going to add the retrosheet statement. If you haven't visited them in a while go take a look. They have one of the best baseball site on the web.

The information used here was obtained free of charge from and is copyrighted by Retrosheet. Interested parties may contact Retrosheet at "www.retrosheet.org".

Steroids double standard?

Yes, I am going to talk a little bit about steroids and baseball but no I promise I won't talk about you know who in San Francisco. That topic has been beaten to death by the national media and you don't read this blog to read about things everyone else is talking about. The player I do want to talk about is Ken Caminiti. Ken last played in 2001 and died in 2004 so some younger readers might not be all that familiar with him. Ken broke in with the Astros in 87 and played most of his career with the Astros and Padres. Why am I bringing him up? While listening to Craig Biggio's speech before the Brewer v. Astros game yesterday he talked a little about Caminiti and Darryl Kile. Caminiti and Biggio were very good friends while they played together and the Houston Chronicle had a great article this year about the pair. When Biggio mentioned them the crowd gave a huge cheer.

Caminiti certainly sounds like a great teammate and a pretty good friend. The problem is Caminiti was also a drug addict. Worse, here is a quote from an SI article a few years back from Caminiti on his steroid use, "I've made a ton of mistakes, I don't think using steroids is one of them." He followed that up with this quote, "If a young player were to ask me what to do, I'm not going to tell him it's bad." Caminiti explains that players have an opportunity to set their families up for life and if steroids can help then the player should take them.

To me this is a very dangerous comment. It has always been interesting to me how fondly Caminiti has been remembered. Maybe part of that is because he has passed away. Maybe some of that was he sounded like he was a very good teammate. Maybe some of that was because he was a great player to interview. Whatever the reason Caminiti seems to have gotten more or less of a free pass in all of this. You might say that Caminiti really only hurt himself while others who broke cherished records hurt the game.

To that I say what about his 1996 MVP award? Caminiti has admitted to taking steroids that year and that year he beat out Mike Piazza for NL MVP. Piazza will now end his career without ever having been a league MVP. Piazza is a sure hall of famer but what if he was a boarder line case? A lot of people have suggested adding asterisks or doing other silly things to the record book. If you are going to do that you should start with the 1996 NL MVP award.

ps. No this is not the special treat I commented about Saturday. Things have gotten pushed back a bit here so I won't be able to write about that until last next week. But it is coming so stay tuned.

Saturday, August 11, 2007

Intentional walk part II

So in the previous post I ran the number for a league average eighth place NL batter and a league average pitcher and found that you should not intentionally walk the eighth place batter unless there were runners on second and third with two outs.

In this post I am going to expand those results over almost a complete range of batters. The concept is still the same so if you have any question about how this is done read the previous post. The only difference is now I ran the code over a range of OPS's from .200 to 1.000. This range should encompass all but the worst hitting pitchers to all but the most elite batters. The results can be interpreted using a three dimensional graph. We will start with a runner on third and two outs.

On one axis we have the current batter's OPS and on the other axis the next batter's OPS with the difference of intentionally walking the batter and pitching to him on the vertical axis. This is basically a flat surface, or plane, that slopes downward. Where that plane crosses zero on the vertical axis the expected runs from both options are the same. That means it doesn't matter if you walk the current batter, or pitch to him, both options will produce exactly the same amount of expected runs. We call this the break even point. Things are a little hard to see on this plot so I am going to instead show you the view from above and things should be a little clearer.

This is like looking at a elevation map of the US. Colors close to red on the spectrum represent large values or mountains on the map. Colors close to violet on the spectrum represent low values or valleys on the map. This graph works the same way. The red area in the lower right means the difference between intentionally walking the batter and pitching to him is high (or very bad). This makes sense because this area is when the on deck batter has a very high OPS and the current batter has a low OPS. The opposite area is where you have a strong batter up now but a very weak batter on deck. The break even line is shown and each color band represent .03 of an expected run.

If the MC was doing a perfect job you would expect these color bands to be straight lines but you can see some small kinks in them. No matter how many trials you run no MC will perfectly match the data but the more trials run the closer it will come. I am planning on rerunning with ten million trials to see if I can straighten out those lines even more but that won't be done for a few days. From our numbers from the last post we know that a batter with an OPS of .707 and an on deck batter of .362 should be close to the break even line but slightly positive which you can see on this graph. The next situation is a runner on second with two outs. Again from our numbers from yesterday we saw that these two values were very close so we should be expecting our graph to be very similar and it is.

If you look very closely you can see a tiny difference in the two graphs but they are basically the same. Again, with a league average eighth placed batter and a league average pitcher on deck you shouldn't intentionally walk the batter which is shown on the graph. So lets move on to the situation where there are runners on second and third with two outs. When should you go ahead and load the bases?

This graph shows some differences which again we expected looking at the numbers from yesterday. Our break even line is lower at the start but rises much quicker and goes off the graph further to the left than the others. What this is saying is if this situation is much more dependent on the on deck hitter than the other two. If he is a very poor batter then you really want him at the plate. But once he becomes even somewhat competent then loading the bases becomes too risky of a play. So basically only do this if the pitcher is coming up.

So I am planning on adding some graphs like these to the statistics page but I want to increase my trials and then I might was well run over all situations to see just how large a difference you need to do something crazy like walking Barry Bonds with the bases empty. This will take a few days for me to sort out and I won't have a chance to work on this until Tuesday so look for another post on this Wednesday or Thursday. Hopefully though I will have a special update Sunday night or Monday that you shouldn't miss.

Friday, August 10, 2007

To walk or not to walk

That is the question. In the NL you often have situations like a runner on second with two outs and the eighth place batter come up. You can either pitch to him and hope you can retire him and have the pitcher's spot leading off the next inning or walk him and face the pitcher right now.

For someone relatively new to baseball it may appear that most managers act alike. They put out a lineup and sit in the dugout not really doing anything until it is time to make a pitching change. But when you look closer you see that some managers actually are wildly different. Bobby Cox, for instance, loves the intentionally walk. His protege Ned Yost almost never uses it. Some managers love getting a lefty/lefty or righty/righty matchup and will constantly change relievers to get the matchup he desires. Tony LaRussa is the obvious one who comes to mind here.

So which of managers are doing the best job? This is a very hard question to answer. The reason for this is situations like the one above. Ask several "Baseball Guys" and you will get different answers. Also, how well the eighth placed batter and the pitcher hit will change the outcome. Because these situations are very complicated not a lot of statistical work has been done on how manager's strategy effect their clubs.

One of the things I would like to do with this blog is start looking at these decisions and shed some light on which really is the right one. Then, examine the data and determine which managers are making those decisions. We will start with the decision whether or not to walk the eighth place batter.

We need to start by making some assumptions to make this task a bit easier. First, we will assume that the batter is an average NL eighth placed batter and the pitcher is an average hitting pitcher (in the next post we will remove these assumptions). Second, the rest of the order is league average to allow us the use of the runs matrix we used in our previous study on defense. 2006 numbers will be used because they are static. Lastly, the pitcher involved is also league average and he has a league average defense behind him.

To solve this problem we would like to calculate first the expected runs of letting the eighth placed batter bat and then the pitcher bat (either now if he reaches or to start the next inning). We can't use the runs matrix here because the eighth placed batter and the pitcher behind him aren't league average. So we could calculate every possibility and then add up the expected runs from all of them.

The problem with this is with just two batters involved calculating every possible outcome is a monumental task. For example the eighth placed batter could single and the runner on second might go to third or he might score. That might be followed by the pitcher who unexpectedly doubles and now the new runner on first might try to score and he might be gunned down at the plate. Whew. that is just going to be too difficult.

The solution to our problem is to write a Monte Carlo or MC. A MC takes a difficult sum (or integral) and breaks it up into smaller, manageable parts. A random number is then thrown to determine which part is chosen and a value is determined. Do that about a million times and average the result and you should have a very good approximation of the thing you are looking for.

So I setup a MC to tackle this problem. As of right now it doesn't have stolen bases or caught stealing or wild pitches included but it does have pretty much everything else in there. Currently, I feed it a players OPS and it calculates about how often that player should hit singles, doubles, triples etc. It runs through two batters once pitching to both and once walking the first. Because a new inning could occur it calculates the expected runs from both the current inning and the next inning that we might be in the middle of.

The first thing to do is to test it with a known value to determine if it is working properly. That is where the run matrix comes in. If I put in two league average players I should almost exactly match what the run matrix says. Here are the results:


The first two columns are the league average OPS for each batter for 2006. The second is the situation, so -2- means runner on second. E_runs is the expected runs from the runs matrix. MC is the results from the Monte Carlo. As you can see the MC does a very good job of reproducing the runs matrix. Better than I had hoped for actually. This means that when I change the OPS for each of the batters I can have confidence that the results that I am seeing are correct. Lastly, from running this several times I found that a million trials was good enough only to the thousands decimal place so from here on out that is all I am going to report. People do a disservice publishing more decimal places than the data is accurate to but that is another story. Anyway, now we are ready for the full results:



So for a league average eighth place batter and a league average pitcher up next the actual expected runs go way down from the run matrix. This is no surprise. According to the MC the intentional walk is slightly worse if there is just one runner on base but actually is significantly better if there is a runner on second and third.

This is not what I expected but an interesting result. In the next article we will remove the constraint of using the league average OPS for our players and come up with a generalized solution.

ps. I have changed the tables to screenshots of the data. I really would like to get the tables working so if there is another out there with experience at putting tables into blogger please let me know.

Thursday, August 9, 2007

Comments on DALG

Whew. Sorry for the long post on the description of DExR and DALG but a full description really was necessary. Anyway, I am planning on creating and uploading more statistics like these and I am going to try to keep a similar pattern of a boring long post describing the metric and then a fun post where I see what we can learn from it. If you are just arriving give the previous post a short skim and then come back.

First, looking at the 2006 data the first thing that jumps out at me is how well the teams with a high DALG preformed. Six of the top seven made the playoffs and only the Dodgers qualified for postseason play with a below average defense. This easily could just be coincidence but the Tigers incredibly strong defense certainly didn't get the media attention it probably deserved.

Looking at the 2007 data DALG appears to correlate less with success but there still is some correlation. Big movers on the list are the Cubs who went from -36.5 in 2006 to 49.9 as of this writing. Again, it appears their much improved defense isn't getting the credit it probably deserves for their large jump in the standing. On the flip side the Cardinals have gone from one of the best defensive teams in 2006 to one of the worst in 2007. How Tony LaRussa is keeping them even remotely close even in that easy division is beyond me.

The really big story though is the Devil Rays. They currently are 128 runs below league average, which is feet and ankles below the rest. I don't have DALG data for years before 2006 (I am working on it) but I have a feeling this could be a historic year for them. Honestly, when that number popped out at me I thought there was something wrong with my code. I combed through and took a careful look at their games and they really are that bad. This actually is in agreement with the other defensive statistics out there. Hardball times has them as the lowest team in RZR by some margin and BaseballProspectus has them as the lowest team in DEF by far. BP has DEF numbers for many years back and I tried to go back to find a team with a lower DEF but I gave up after going back ten years.

Another interesting thing is they have actually played pretty respectfully at home. Their home/road splits are -18/-110 which is truly amazing. I wonder if this has anything to do with the Fieldturf they have installed at the Trop. If there is interest I might generate home/road splits for every team. That might show how much effect park factors have on these numbers. Playing half your games in a park with a large foul territory or tall infield grass certainly could be affecting these numbers.

The other thing to note is pitcher movement from 2006 to 2007. Jeff Suppan went from a great defense in St. Louis to a below average one in Milwaukee (though this year's Cardinal team would have been even worse). Suppan's ERA is up more than half a run which corresponds nicely with the change in DALG from the two teams. Oliver Perez looks like a new pitcher this year for the Mets but some of that is because he moved from a team playing terrible defense behind him in Pittsburgh to the best defense in the league with the Mets.

And then there is the much maligned Yankee defense. DALG grades them out as solidly above average and better than last year's version. So why have they really given up all those runs? The pitching has actually be worse than advertised and that is saying something. With their juggernaut of an offense and a very capable defense if they can get any pitching at all down the stretch they are going to be serious players for the wild card.

Team Defense

The role that team defense plays in baseball has a history of being difficult to measure. Recently, improved metrics have shed more light on what is going on. In this article I am going to discuss some of these metrics, their strengths and weaknesses, and introduce a new metric.


Traditionally, team fielding percentage was the best metric available to studying team defense. This system is based on official scorers who assigned errors to plays that should have been made by the defense. Add up all of these errors and divide by the number of opportunities and you have fielding percentage. Problems with this metric are easy to see. Each scorer will have a slightly different view of what plays should have been made. Getting only the lead runner instead of turning the double play are scored identically.


In the 1980's Bill James invented Defensive Efficiency Rating (DER), which basically tracks the number of outs a defense produced on balls in play. There are a few different ways to calculate DER so I am adding a link to Baseball Prospectus for more information. DER does away with any subjectivity and correctly credits double plays. DER however, doesn't credit an outfielder who keeps a runner from taking an extra base, or penalize a catcher who airmails a throw to second on a stolen base attempt. Lastly, no corrections are made for the type of ball in play. Line drives are harder to turn into outs than ground balls.


Zone rating attempts to correct for this by assigning a zone for each defender on every play. If the ball is put in play in a defender's zone and he doesn't field it he is penalized. Add this up for each defender and you have a team's zone rating. Hardball Times has just started to publish a Revised Zone Rating (RZR), which adjusts the zone rating for the type of ball in play. While this is another improvement, this system still has some weaknesses. Catchers are completely ignored in this system. Again, no credit is given for a defender who prevents runners from taking extra bases. Lastly, it is hard to get a handle on the effect of the metric. Just how big is a difference of a team's who has a 0.84 RZR and one with 0.80 RZR?


To help resolve some of these issues I am going to introduce Defensive Expected Runs (DExR). DExR is the sum of the differences between the expected runs before and after each play. That is, for every play I look up the expected runs for that situation before and after the play is made. Adding up this difference for all the plays a team makes produces the DExR. The only plays that are not included in this metric are plays the the defense can't contribute on. The list of plays excluded then are: strikeouts, walks, hit by pitches, balks, and home runs (though inside the park home runs do count).


Take the situation where there is one out and a runner on first base and the batter singles, allowing the runner to reach third. The expected runs (in 2006) for a runner on first with one out is 0.5675 and for runners on first and third is 1.1734 so the DExR for this play would be -0.6059. If instead the runner tries to steal second and is thrown out the expected runs for nobody on and two outs is 0.10907, so the DExR is 0.45843.


A quick note on how to think about this metric. If a team ends the year with a 80 DExR that doesn't mean the defense was responsible for saving 80 runs. What it means is that, the sum of the expected runs after every play was 80 runs less than the sum of the expected runs before the play. It is a subtle, but important, difference. This means the DExR compared to league average is more important number than a team's DExR itself. If you subtract each team's DExR from the league average DExR you get each team's DExR above league average. For short I am labeling this DALG.


There are several advantages to this metric. First, all defensive plays are counted, not just balls in play. Things like wild pitches and runner advancement matters. Second, this metric is measured in the currency of baseball; runs. The weakness of this metric is that it doesn't correct for the type of ball in play on balls put in play. A solution to this would be to assign zones like RZR and produce an expected runs matrix for each situation and each type of ball in every zone. Unfortunately, that is currently beyond the scope of my code.


Here is a link for the 2006 and 2007 DExR numbers. This post has already gone long so I will save my comments on the data for another post. Ideally, I would like to have DExR automatically updated every morning. I am running into some difficulties though with my host so I am still going to have to update by hand. Hopefully in the next few days that will get resolved.


Mission Statement

For a while now I have been writing up small baseball related articles and putting them on different pages. Finally, I have decided to create a blog and host them all myself. The basic plan will be to update several times a week with some interesting content and some statistics that you won't find anywhere else on the web. These posts will often contain work in progress and I encourage you to leave comments or contact me with your thoughts. Even if you only have an idea for a study that you would like to see done contact me and, if it is something I can handle, I will tackle it.

PS. If you are reading this and happen to be an editor of a major baseball publication like Baseball Prospectus, Hardball Times, or even ESPN and are interested in publishing something you see here, please email me.

PPS. If you happen to be a general manager of a sports team and are interested in hiring me to work for you please, PLEASE, email me.