Thursday, September 20, 2007

Brand New Player Cards

Well I was able to get things finished a bit early so here are some brand new player cards. I am planning on making a big post about exactly how these plots are produced this weekend but for now let me just say that I am correcting the initial position/velocity of the ball and the accelerations (and a big thanks to Dr. Nathan for finding a mistake in my code that corrects the acceleration). This way I can properly combine home and road data. Once that correction is done I am pretending each ball was thrown at sea level at standard temperature. This is done so my classification code can look at each pitch on the same footing and then determine what type of pitch it was.

The classification code still needs some work. It is getting better but still will miss-classify some pitches as belonging to a wrong group or incorrectly classify a group. Maybe a better way of thinking about the two possible errors is the first error is when a pitcher throws both a slider and a curve and a certain pitch got labeled a slider when it should have been labeled a curve. The second error is when a group of pitches are labeled as sliders when really every one of them is a curve. I am still trying to adjust this code but it is getting better. The only real big hole right now is it is not calling any group of pitches split fingered fastballs. It also is having some issues with side armed pitchers and if they are throwing two or four seamed fastballs. I am working to correct that but I thought it would still be useful to show what I currently have. The last time I did this I got some excellent reader response on what pitches the clustering algorithm was getting wrong and I'd like to ask for your assistance again. Take a look through the player cards and either comment below what is incorrect or email me (my address is under my profile to the right).

Even if you don't find any pitches that are being classified incorrectly if there is something you don't like about the presentation of the player cards, or something you would like to see added, again please let me know. This really is my first attempt at something like this and I am not incredibly handy with html so feel free to suggest an alternate way of doing something.

4 Comments:

At September 22, 2007 2:00 AM , Blogger Ike said...

I like the plot style on these better. Nice job.

So if I were to suggest a few things on the presentation and htmlizing these, I'd add a few things...first, just for completeness, I'd put the players teams, handedness, and height. Not that those are necessary, but I think it would be nice.

With the identification algorithm, I think it has maybe gone a little cutter-happy. Take a look at Mike Bacsik (who I chose completely at random), and also Mariano Rivera. Obviously, Rivera throws primarily the cuttter (and how he gets away with only throwing one pitch still baffles me), but the thing about him that stands out at me from that card is that the thing identified as his fastball breaks harder, and in the same direction, as the cutter.

But going back to Bacsik, what stands out at me is that he has 2 clusters of "cutters". One of them looks like the tail on the fastball. The other looks like a Slurve. It literally sits right between what the algorithm calls a slider and a curveball.

ChaSeung Baek (who I chose to look at because he was right under Bacsik) gets exactly the same effect.

There are some other anomalies like this too...I think that there are some cases of splitties getting labeled as sinkers. Most sinkers will have just a little more lateral run on them than the fastball, and I've seen a few guys who have sinkers that have no difference in lateral movement from the fastball to the sinker.

 
At September 22, 2007 8:23 AM , Blogger Josh Kalk said...

Yeah the splitters are mostly being being called sinkers but a few of them are being called changes as well. The lateral run is something I have noticed but it does appear to me that a few pitchers 2 seamer also has some lateral run. As for the cutters yeah they are a problem as well. Some of these should be sliders and some of these should be changes. And with Riviera I actually have no idea what it is doing. Clearly the "fastballs" should also be cutters.

As I got up this morning I thought of a completely new way of trying to go about classifying pitches and I am going to try to give that a try this weekend.

Thanks for the input Ike.

 
At September 23, 2007 1:59 PM , Blogger Anthony said...

Great stuff. I'm loving the graphs. Could you include number of pitches on the table above the graphs?

With Rivera, I think you can classify his pitches very easily based on the horizontal break. Anything with a negative break is his two-seamer (which I would guess he throws maybe 20% of the time) while everything else is the cutter.

Also, Mike Mussina throws a changeup a fair amount of the time. There's a cluster of cutters for him in the 70-75mph range, which probably are all changeups.

I'm assuming the label 'fastball' means four-seam fastball. Chien-Ming Wang doesn't throw a four-seamer: his fastballs are exclusively sinkers. Same thing with Rivera: he throws all cutters and sinkers.

 
At September 24, 2007 8:57 AM , Blogger Josh Kalk said...

Thanks for the tips Anthony. This is exactly the kind of stuff I am looking for. Rivera is a real problem because he shows both bad sides of the clustering. First, his two seamer is being lumped in with his cutter causing one big blob. Second, a few of his cutters are breaking off from the blob and because they have a higher velocity the algorithm is calling them two seam fastballs despite their movement. If I tighten the distance requirement to make sure the two seamer is found then the breakaway pitches will certainly stay. If I loosen the requirements to get them lumped back in with the cutter then the two seamer will definitely not be found. He is a real problem right now and the reason I am planning on completely changing the clustering algorithm.

Wang shows a slightly different problem. The clustering algorithm is doing a good job with him but it is calling his two seam fastball (sinker) a four seam fastball. Why is this? Because his vertical break on that pitch on average is 6.72. If you look at Derek Lowe or Brandon Webb you can see that the algorithm is finding sinkers but they break more downward than Wangs. So the algorithm thinks that pitch is closer to a four seamer than a two seamer. It is wrong of course, but guys right on the border with their break like Wang are just going to be trouble to classify.

Lastly, Mussina's problem is his changeups aren't close enough together and the algorithm is getting lost trying to put them together and then lumping the few that it does properly find as cuttters. This problem gets resolved with more statistics so hopefully even if I don't change anything with the algorithm it will figure it out when the season is over.

As for the number of pitchers that should be easy to add and the next time I do an upload late this week that will be included.

 

Post a Comment

<< Home