Tuesday, September 18, 2007

Breakthrough

Well basically nothing got done this weekend but I did have a little time Monday while watching baseball to muck with my code and I think I have finally solved the riddle of correcting the acceleration data. As I noted at the end of this post the solution to the problem would not be a nice linear solution like the initial position or the initial velocities were. As Dr. Alan Nathan points out in his analysis (p. 2), the forces (and, as such, the acceleration) on the ball in flight are definitely not linear.

This means a non-linear solution will be needed and in particular a solution is going to be needed for every park for every day that a game has been played. For example, the air density is needed for calculating both the drag and Magnus forces on the ball. The most widely known example of air density causing a problem is the thin air at Coors field in Colorado. What isn't as widely known is that the air temperature plays just as big of a roll in calculating the air density as the distance above sea level. This year the Reds played a game against the Pirates at home where the game time temperature was 30 degrees Fahrenheit. They played another game against the Braves where the game time temperature was 99 degrees Fahrenheit. Obviously, the ball is going to behave quite differently in these two situations. Again, the air density is a non-linear equation so this is going to be a mess.

I am not going to go into great detail on everything that went into my 700 line C++ program that calculated these corrections because while I think I have everything correct there still might need to be a tweak or two to the code. Also, I really don't want to bore the readers with four paragraphs of hard core explanation. If this is something that you the readers want to see, please add a comment at the bottom and I will consider it. The short version is I modified my code that was used for the linear corrections to make it non-linear and added in some physics equations to find the corrections and only used fastballs for making this comparison. This is also going to make it very difficult if not impossible for me to properly disseminate the corrections. I will be thinking about this and hopefully will come up with a solution.

What I do want to do is show you the results. For that I am going to use Jeff Francis since he has been in the Rockies rotation the whole year and Colorado does indeed need the largest corrections even after the atmospheric values were taken into consideration. Instead of showing you break of the ball, like I have before, in these plots I am going to show you the actual accelerations. Break can be calculated from these values (along with the initial positions/velocities) and again Dr. Nathan has a good explanation of how to do that here. The reason I want to show acceleration here is because that is the value that is going to be corrected. So starting with the uncorrected data here is the x and z accelerations for home and away games while PITCHf/x was on for Jeff Francis.
Now maybe I don't know a lot but one thing I do know is the ball should break less (have a smaller acceleration) at Coors than at other parks around the league. Yet this data appears to be exactly opposite of that. Obviously, something is messed up with the data. The blob of data around (-5,-35) is Francis' curve ball and the huge mess to the upper right is a concoction of his fastball and change. Lets apply the correction factors to both the home and road pitches and see if we can't clear things up a bit.
You can see some of the non-linearness if you look very closely between these two plots. Some good things happened here. First, the away game data changed a bit but very slightly. This is exactly what we would expect to see as his road starts should be much closer to league average than his home starts. Second, the home game data now shows less acceleration both vertically and horizontally which is good. Francis' curve at home actually appears to have about the same horizontal acceleration as his curve on the road. There does appear to be a slight tail pointing towards zero in the road data though. I wonder if this is because I used only fastballs in my comparison or if maybe Francis is compensating and slightly overthrowing his curve at home? This is one of the loose ends I am still trying to track down. Lastly, the huge fastball/change blob in the uncorrected data has become much more distinguished and now it definitely appears to be two blobs with the fastball in the upper right and the change down and toward zero horizontally. That is exactly what we would expect from a change and the reason this blob is so close to the fastball blob is because Francis has a very good one. Looking back you can kind of make out the distinction in the previous plot but it is much more defined here which again is a sign that the corrections are working.

There still is one problem though. How do we know this correction is moving the data to the right spots? We know that the Coors data should show less acceleration than the road data but how can we tell if it is overcompensating or under compensating? The only way I know how to check this is to transform each pitch like it was thrown at sea level at standard temperature. You can check the air density link again if you want to look that up. In this frame everything should be equal and all of the accelerations now should match up. So does it?
Yes it matches up very well. A careful eye will notice that not only did the Coors data get an increase in acceleration (or decrease since these numbers are mostly negative) but the road data did as well (though much smaller). The reason for this is standard temperature is about 59 degrees Fahrenheit and most baseball games are played at temperatures above that. Again, Francis' curve looks slightly different at home and away here. Maybe that is from the extra tail I mentioned before in the road data but maybe not. The strange thing is the vertical acceleration seems to be spot on but the horizontal acceleration is slightly off. I don't really have a good explanation for that right now. Maybe we should ask Jeff Francis himself who studied physics while at college at the University of British Columbia.

So where is all of this going? Well except for a little fine tuning I think I am ready to move to the full data set. You might remember me saying that I have stopped adding data so I would have a consistent set to work with. I am almost two weeks behind but I have started grabbing the new data now. Once that is done I have to run it through my parser to get the data in a usable form, then my correction code to get the new correction values, and lastly my player card generator which I still need to adjust a bit to output more plots than just the break. Hopefully, I will be able to have at least four plots like I showed for Jose Contreras in my previous post. If there is an extra pitching plot you would like to see added let me know in the comments section below. I also still need to remove those pathological points I showed in those Contreras plots. Joe P. Sheehan has suggested that a cut on initial speed might solve the problem. I had kind of thought that might be right but somehow that got lost in my memory so thanks to Joe for telling/reminding me of that. If things go smoothly I should have player cards with corrected home and road data by Thursday night. If things don't go smoothly or if I don't get a chance to work on this then I will have them up by the weekend. Once that is completed I can move on to other fun topics I wanted to look at with this data.

3 Comments:

At September 18, 2007 10:42 PM , Blogger Mike Fast said...

Josh, I'd love to see anything and everything you are willing to share. You might be surprised how many other people out there would also love to see the gritty details, but I think it's quite a few.

One other thing I had been wanting to ask you about...when you calculated the corrections to the x0, z0 initial point, did you assume that each park had a single correction factor that did not vary with time? I noticed with Papelbon that his data was quite different between two different Boston homestands, and Dr. Nathan mentioned to me that the PITCHf/x system is typically recalibrated between homestands.

 
At September 19, 2007 4:01 PM , Blogger Josh Kalk said...

I'll think about making a huge post with all the gory details on what my code is actually doing after things settle down a bit.

As for your other question I had been considering each park to be consistent but as you will read in my next post (later today), that will have to change.

 
At September 19, 2007 4:47 PM , Blogger Alan said...

Hi...Alan Nathan here. I'd like to know more about your non-linear equations and your correction to the acceleration. Please contact me directly at my e-mail address, which you can get from my web site (webusers.npl.uiuc.edu/~a-nathan/pob).
Thanks...

 

Post a Comment

<< Home