Tuesday, May 20, 2008

2008 corrections are now online!

Yes that is right, after just one set of interleague series I now can run my correction code on the 2008 data. The player cards have now been updated with the corrections and my pitch classifications. These classifications are not perfect at the moment as the 2008 corrections still aren't quite as strong as the 2007 corrections but they appear to be better than the MLBAM corrections which are done on the fly (and therefore are much harder to do). Automated updates will be down for a few days but expect them to be back in full force by the weekend.

Wednesday, May 14, 2008

PSA to all PITCHf/x guys

As of the 12th MLBAM has changed the header for their data. It used to look like this:

atbat num="1" b="4" s="2" o="0" batter="435065" pitcher="458567" des="Reggie Willits walks. " stand="L" event="Walk"

Now it looks like this:

atbat num="1" b="3" s="1" o="1" batter="408299" stand="R" b_height="6-0" pitcher="435043" p_throws="L" des="Omar Infante grounds out, shortstop Brian Bixler to first baseman Adam LaRoche. " event="Ground Out"

Make sure to change you parser accordingly.

Tuesday, May 13, 2008

Are there issues with the 2007 data corrections?

Recently at the PITCHf/x summit Ike Hall presented a talk on data corrections where he noted that it appears that my corrections are over correcting the data on off speed pitches. You can find the talk on the summit's website it is labeled Data_Improvement.pdf. On page 11 Ike shows a plot of the differences between the drag coefficient at Comerica and PETCO parks. While the data seems to great in the fastball region it appears to differ from 0.3 to 0.5 for the drag coefficient for pitches thrown near 60 MPH.

Now this seems like a huge issue. Obviously there a very large difference between 0.3 and 0.5. The problem is difference in drag actually results in a very small effect. If you assume the air density to be 1.2 pascals, the balls initial velocity to be 60 MPH (26.8 m/s), the circumference of a baseball to be 9 inches (area 0.004 m^2), then you can calculate the drag force. If you do that you get the drag force between and 0.52 and 0.86 N. If you then want to say want to find the differences in final velocity you can use the equations of motion and if you assume the ball takes 0.5 second (which is actually quite large) you get a difference in final velocities of less than a third of a meter per second.

So while it appears that my corrections are indeed over correcting the data the results of these over corrections are small. That said, I will be looking in to how to adjust for this to fix my corrections but don't expect a huge change to the data. Once I get that worked out I should be ready to run the corrections on the 2008 data as inter league play is nearly upon us and that is what my code really needs.