Preliminary Correction to PitchFX Data Part II
So I am still working the kinks out of my code and this afternoon I found a bug that was messing up with the weighted variance when it came time to calculate the differences between each of the parks. This carried over and messed with the release point factors as well. It did so in a nasty way that didn't have a huge effect on parks that had a lot of data which is why I didn't catch it earlier. Anyway, I fixed the bug and when I went to recalculate the park factors all of the errors went way up. This actually seemed not unreasonable to me as I was planning on adding in a second order calculation anyway. I started talking about it in response to one of the comments in the last post.
It really is a pretty simple concept. In addition to directly comparing pitchers who pitched in park A and park B I am adding in pitchers who pitched in park A and park C and then pitchers who pitched in park B and park C. I am adding this two step process together in quadrature just like I did when I calculated the differences originally. It turned out this improved things but not quite as much as I hoped. So I took it one step further and added a third order correction as well. Every additional step you take helps less and less but third order still was enough to produce some pretty pleasing results. Obviously, this isn't a very good writeup of the process but for reasons that I will detail later, the code still isn't quite where it needs to be. So, I am not going to do another full writeup of the process until it is more set in stone. Here are the results from the new method including all games played yesterday, Sept. 4th.
Correction to the z0 release point (in inches)
park factor variance
bos -5.703 0.18480
sdn -4.070 0.04284
was -3.607 0.36426
sln -3.148 0.03288
cha -2.761 0.04104
nyn -2.759 0.50190
flo -2.601 0.10395
mil -1.757 0.07686
lan -1.051 0.03574
ari -0.880 0.05964
hou -0.719 0.06288
sea -0.391 0.04026
bal -----
pit -----
det 0.139 0.10242
atl 0.349 0.03999
cle 0.359 0.07261
tor 0.451 0.04750
oak 0.769 0.03911
phi 0.862 0.11741
nya 1.053 0.36311
tba 1.115 0.24041
chn 1.436 0.06175
col 2.588 0.07733
kca 2.629 0.16184
cin 2.884 0.08300
tex 3.211 0.03561
ana 3.370 0.04835
sfn 3.848 0.04182
min 4.464 0.07266
I am moving to inches because if I report the numbers in feet some of the variances are incredibly small and kind of hard to write in a nice table format. The relative error doesn't change but this is easier to read. Still no data for Baltimore or Pittsburgh but Washington is showing up. I looked into this and found that th pitchFX system was turned on for one game at RFK but has been turned off since and no data was received so far for their home stand. Things look pretty good here with the statistical error being at most half an inch but when you turn to x0 things get a bit out of hand.
Correction to the x0 release point (in inches)
park factor variance
flo -9.687 3.16846
ari -6.589 1.70915
tex -3.249 1.36334
sdn -2.364 1.72881
chn -2.123 1.53192
hou -2.059 1.75174
cin -1.936 2.51889
sfn -1.880 1.80075
phi -1.704 3.41626
nyn -0.715 8.95378
col -0.492 2.44724
sln -0.449 1.25995
ana -0.441 1.11405
sea -0.275 1.50260
was -0.051 20.73689
bal ----
pit ----
cha 0.533 1.78187
det 0.625 3.14010
oak 0.632 1.39684
lan 0.851 1.62027
kca 0.911 3.62942
cle 1.453 4.05667
tor 2.261 1.42610
nya 2.999 8.05124
mil 3.103 3.15419
min 3.125 1.66268
bos 3.876 3.28594
tba 5.705 3.37659
atl 7.728 1.82724
Wow. The first thing to notice is that the correction in x needs to be bigger than the correction in z. If these numbers are correct, pitchFX is missing the horizontal release point by almost 10 inches in Florida. That is huge. Also huge are the errors on these numbers. Even the parks with a lot of data still have errors bigger than an inch. That is just too big. Maybe this is because I need to go to forth order because the spread is much bigger. Maybe there needs to be a separate correction for left handers and right handers for each park. That would really suck because cutting an already thin sample by about 1/3 to look at just lefties would be pretty painful.
Anyway, work is in progress but I need to be able to hammer out the details for x0 and z0 before I move on to the initial velocities. If I use the same code run for vz0 I get a correction for Fenway that is 2.719 with an error of 170! Obviously, taking out the breaking pitches will be essential for correcting that data. If anyone has any thoughts about possible improvements to these corrections I'd love to hear them. Either comment below or drop me an email. If we can just get this data corrected I believe it would be a huge step forward in analyzing baseball.

0 Comments:
Post a Comment
<< Home