<?xml version='1.0' encoding='UTF-8'?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/'><id>tag:blogger.com,1999:blog-6337312839698763116</id><updated>2008-07-08T10:28:46.132-04:00</updated><title type='text'>from small ball to the long ball</title><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/blog.html'/><link rel='next' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default?start-index=26&amp;max-results=25'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default'/><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>43</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>25</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-2266302288523898423</id><published>2008-06-27T13:46:00.005-04:00</published><updated>2008-06-27T15:56:26.959-04:00</updated><title type='text'>Behind the scenes with the Princton Rays</title><content type='html'>As I mentioned in my hardballtimes article, the staff working behind the scenes in Princeton is phenomenal.  Despite huge hubbub over Tim Beckham everyone took time to sit down and talk with me and answer my questions.  Jim Holland, who has been the Princeton GM for 17 years, does a great job running the ship.  People don't understand that even in short season ball a minor league GM has to work the whole year around securing advertisements, setting up promotions, getting things like team schedules printed, setting up travel, and setting up community outreach (this obviously isn't an exhaustive list just to give you an idea).  It is not an easy job requiring long hours and with a smaller staff at rookie ball a willingness to do just about everything from answering phones, to selling tickets, to the 50/50 raffle, to running on the field activities between innings is a must.  Jim does all of these things with a huge smile on his face and his love for the game and the team shines through.  It was his idea to come up with the &lt;a href="http://web.minorleaguebaseball.com/news/article.jsp?ymd=20060724&amp;amp;content_id=109320&amp;amp;vkey=news_milb&amp;amp;fext=.jsp"&gt;Mercer Cup&lt;/a&gt; which is Princeton has won the past three years.  The cup is something special to the fans and games against Bluefield are quite the event.&lt;br /&gt;&lt;br /&gt;Samantha Craig is the league photographer and it is her pitchers you see on everything from the team cards, programs, yearbooks, and even bobble heads.  She provided the excellent shot of Tim Beckham for my THT article and the picture for the MiLB story.  She also has a soft spot for the forgotten members on the field, the umpires.&lt;br /&gt;&lt;br /&gt;The team interns are the do everything team for the Rays.  Malcom, Jamar, Jeremy, Rynel, and&lt;br /&gt;Matt (who will be starting a blog of his own shortly) get to the park early to tend to the field, get the equipment ready for batting practice, sell tickets and souvenirs, and running the scoreboard.  When the game is over they still are on the clock cleaning up the stadium and getting ready for the next nights game.  Interns can often get overlooked but with a small staff like Princeton has good interns are a must.  The job also helps prepare them for a career in the business as Pat Day, the GM of the Lansing Lugnuts got his career started by interning at Princeton.&lt;br /&gt;&lt;br /&gt;This team runs like a well oiled machine even though the year has just begun for Appy league teams.  The field was in pristine condition despite a rain storm just a few hours before game time on Wednesday night.  If the field hadn't been in such good shape and the game had been called I wouldn't be here writing about any of this.  As I mentioned in the THT article getting an opportunity to pull the tarp off the field was a blast and if you ever get the chance to do that jump at it.  Thanks again guys, I had a blast and learned a whole lot from you and I will definitely be back soon.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2008/06/behind-scenes-with-princton-rays.html' title='Behind the scenes with the Princton Rays'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=2266302288523898423' title='0 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/2266302288523898423'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/2266302288523898423'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-7283941889855914634</id><published>2008-06-27T11:37:00.007-04:00</published><updated>2008-06-27T15:55:44.933-04:00</updated><title type='text'>A look at some other prospects I saw this week</title><content type='html'>Besides Tim Beckham there were quite a few other players that could one day have an impact in the big leagues.  Here is a first impression of some of them.&lt;br /&gt;&lt;h3&gt;Royals players&lt;/h3&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://web.minorleaguebaseball.com/milb/stats/stats.jsp?n=Richardson%20%20CF&amp;amp;pos=&amp;amp;sid=milb&amp;amp;t=p_pbp&amp;amp;pid=519193"&gt;Hilton Richardson&lt;/a&gt;&lt;/b&gt; 7th round pick, 2007&lt;br /&gt;Richardson is a tall, speedy, center fielder who is let handed.  Richardson appears to crowd the plate and had some issues with a lefty on the mound though that hasn't been a problem for him so far early in the season.  Richardson was pretty disciplined at the plate seeing a lot of pitches in his first three at bats before swinging early in his last two.  He smoked a liner the other way in the first inning on a fastball outside.  Richardson did appear to have some issues making contact though especially on some off speed pitches.&lt;br /&gt;&lt;br /&gt;In the field Richardson had an eventful day.  To start the third inning he misread a ball over his head that went for a triple.  While Hunnicutt stadium isn't Coors field it is up over 3,000 feet above sea level so there is a mini Coors effect.  This certainly could have been an issue on this play as the ball just kept carrying and he was just jogging back until the very end.  In the fifth with two outs and nobody on a ball was crushed to the right field gap.  Richardson got a great jump on the ball and turned on the jets and almost caught up to a ball that was hit much closer to the right fielder than him.  Unfortunately for him, the ball hit off the top of the wall and bounced behind him allowing the batter to reach third.  With two outs you don't mind seeing a guy go for a ball but it was pretty clear that he wasn't going to catch it and he should have slowed down and played the ball off the wall.  He also got turned around on a routine ball but managed to make the catch.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://web.minorleaguebaseball.com/milb/stats/stats.jsp?n=Caldwell%20%20RF&amp;amp;pos=&amp;amp;sid=milb&amp;amp;t=p_pbp&amp;amp;pid=542995"&gt;Allen Caldwell&lt;/a&gt;&lt;/b&gt; 12th round pick, 2008&lt;br /&gt;Caldwell is another left hander who was playing right field for the Royals.  He too showed good patience at the plate getting ahead in the count 2-0 twice but when he got two juicy fastballs he wasn't able to do much with either grounding to third and flying out to center.  In the eighth he came up with the lead run at second and one out.  The Rays went to the pen and summoned a lefty reliever but Caldwell calmly took the first pitch, a fastball middle in, and shoot it to center field for a base hit scoring the lead run.  The bad news is he then promptly got picked off first but an error by the first baseman throwing to second took him off the hook.  Caldwell didn't have a single opportunity on a fly ball but did have trouble digging a ball out of the bullpen on a catcher overthrow on a strikeout.  In fact he had so much trouble the batter almost scored on the play but was thrown out at the plate.  Sadly for the Royals, the winning run scored ahead of the batter on the play and if Caldwell had quickly retrieved the ball he probably could have held the runner at third.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://web.minorleaguebaseball.com/milb/stats/stats.jsp?n=Lehmann%20&amp;amp;pos=P&amp;amp;sid=milb&amp;amp;t=p_pbp&amp;amp;pid=518932"&gt;Mike Lehmann&lt;/a&gt;&lt;/b&gt; 20th round pick, 2007&lt;br /&gt;Lehmann is a right handed pitcher with an over the top delivery.  He features a four seam fastball in the upper 80's that touched 90 MPH on the gun twice but is pretty straight.  He also have a curve in the mid 70's and a change up in the low 80's.  Lehmann was around the strike zone all game except for a spate of wildness in the first after Tim Beckham's infield hit.  He hit the next batter and then walked the following batter to load the bases on four pitches.  The Royals pitching coach came out and talked to him and he didn't walk another batter all game.  The bad news was he also didn't strike out anyone in his four innings of work.  In fact, he only got three swings and misses during his outing.  That is going to have to change if he is going to have any future.  Lehmann did field his position well picking up two rollers with his bare hand and fired strikes.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://web.minorleaguebaseball.com/milb/stats/stats.jsp?n=Flanagan%20%28L%2C%200-1%29&amp;amp;pos=P&amp;amp;sid=milb&amp;amp;t=p_pbp&amp;amp;pid=518682"&gt;John Flanagan&lt;/a&gt;&lt;/b&gt; 13th round pick, 2008&lt;br /&gt;During the three games that I watched this past week Flanagan was the most impressive pitcher I saw by far.  He is a tall lefty JuCo product who has a three quarters delivery.  He throws a sinker in the upper 80's which he consistently kept down in the zone and a slider in the low 80's that has very good bite down and away from left handed batters.  He mixed in a couple of change ups to right handed batters but it didn't look like he had a good feel for the pitch and left several of them up in the zone.  He struck out five in four innings (despite MiLB saying 4 1/3) with the last one on a slider in the dirt that the catcher mis threw and cost the Royals the game.  It was scored a wild pitch but the catcher was asking for it in the dirt and blocked the pitch well he just failed on the throw.  A very tough luck loss for Flanagan.  Being a JuCo player, he is more advanced than most in rookie ball and I would like to see him in the rotation.  He shouldn't neccessarily be pigeon holed into a LOOGY role as his sinker/slider should be effective against right handed batters as well.  If he can develop a decent change up that is.  If he does wind up as a LOOGY his delivery should make it tough on lefties and he should get plenty of ground balls with his sinker.  Here is another example of his polish; he has two distinct moves to first base.  The first is a classic step towards first and throw and a second step off and quickly fire to first.  I could definitely see him being a C/C+ sleeper type going forward.   You can read about Tim Beckham's at bat against him in my THT article.  Here are a couple other &lt;a href="http://thekclpipeline.blogspot.com/2008/06/farm-fresh-royals-2008-this-years.html"&gt;reviews&lt;/a&gt; &lt;a href="http://perfectgame.atinfopop.com/4/OpenTopic?a=tpc&amp;amp;s=114295945&amp;amp;f=5914021231&amp;amp;m=7941004531&amp;amp;r=7941004531#7941004531"&gt;of him.&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Rays Players&lt;/h3&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://web.minorleaguebaseball.com/milb/stats/stats.jsp?n=Beckham%2C%20J%20%203B&amp;amp;pos=&amp;amp;sid=milb&amp;amp;t=p_pbp&amp;amp;pid=542920"&gt;Jeremy Beckham&lt;/a&gt;&lt;/b&gt; 17th round pick, 2008&lt;br /&gt;Tim isn't the only Beckham playing at Princeton as his older brother Jeremy is the second baseman on the team.  Jeremy only got in to the game as a defensive substitute in the ninth inning on Wednesday but I got a pretty good look at him earlier in the week.  At the plate Beckham is somewhat struggling right now and I think it was probably a good move by Rays manager &lt;a href="http://www.minorleaguebaseball.com/roster/page.jsp?ymd=20060201&amp;amp;content_id=40050&amp;amp;vkey=roster_t455&amp;amp;fext=.jsp&amp;amp;sid=t455"&gt;Joe  Szekely&lt;/a&gt; to give him a day off.  I have noticed that in the low minors the umpires have a pretty liberal strike zone and it appears to me that Beckham might be having some problems with it.  The first night I saw him the umpire was consistently giving the pitchers the outside corner but Beckham wasn't adjusting and taking that pitch even with two strikes on him.  He also seemed willing to go out of the zone low going after several balls in the dirt.  On Tuesday though he had a better game collecting three hits including a solid double in the gap.  He was a pretty patient hitter in college so I would expect him to make adjustments relatively quickly though he has no walks so far on the season and I haven't even seen him in a three ball count yet.&lt;br /&gt;&lt;br /&gt;In the field things are a much better story.  At rookie ball, you see a lot of miscues on routine balls and a fair amount of throwing to the wrong base and things like that.  Jeremy Beckham has looked very solid at second to me making several nice plays and turning the double play well.  It appears that not only does he have a quick turn but he has a pretty accurate arm even with someone barreling in on him.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://web.minorleaguebaseball.com/milb/stats/stats.jsp?n=Biell%20%20CF&amp;amp;pos=&amp;amp;sid=milb&amp;amp;t=p_pbp&amp;amp;pid=518455"&gt;Dustin Biell&lt;/a&gt;&lt;/b&gt; 5th round pick, 2007&lt;br /&gt;Biell has looked absolutely over matched at the plate to start the year.  He struck out two more times in four at bats on Wednesday pushing his total to 15 in 34 ABs this year.  He swung and came up empty a lot especially on off speed pitches.  He did work the count well in a couple of at bats but never hit the ball hard.  He was the strike out victim in the 9th that scored the go ahead run so give him credit for not giving up on the play.  The ball was blocked by the catcher and it was just a few feet in front of him but Biell hustled down the line and made the catcher make a throw which he didn't do.  In the field Biell made a nice play on a blooper a couple of nights ago.  He also air mailed a throw from center to the plate which caused two runners to move up though.&lt;br /&gt;&lt;br /&gt;&lt;b&gt;&lt;a href="http://web.minorleaguebaseball.com/milb/stats/stats.jsp?n=Jarman%20%28W%2C%201-0%29&amp;amp;pos=P&amp;amp;sid=milb&amp;amp;t=p_pbp&amp;amp;pid=460691"&gt;Mike Jarman&lt;/a&gt;&lt;/b&gt; 26th round pick, 2008&lt;br /&gt;Jarman is doing what you would expect a 23 year old college pitcher to do in the rookie leagues, dominate.  Jarman went 1 2/3 innings getting all of the outs by strikeout to get the victory Wednesday night.  He did give up the lead when brought into the game in the 8th on the hit by Caldwell I described above but he promptly picked him off though the first baseman failed to finish the play.  Jarman's off speed pitches, his change up and curve, are too much for rookie league batters though only time will tell if that translates to success further up the ladder.  If he continues pitching like this there will be nothing left to prove for him in rookie ball and should be sent to A ball to see if he can stick there.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2008/06/look-at-some-other-prospects-i-saw-this.html' title='A look at some other prospects I saw this week'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=7283941889855914634' title='0 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/7283941889855914634'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/7283941889855914634'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-8366632813655855615</id><published>2008-06-11T12:37:00.002-04:00</published><updated>2008-06-11T12:39:43.755-04:00</updated><title type='text'>2008 Web based tool is now available</title><content type='html'>The 2008 web base tool is now available &lt;a href="http://baseball.bornbybits.com/php/2008_tool.php"&gt;here&lt;/a&gt;.  You can also still access the 2007 web base tool &lt;a href="http://baseball.bornbybits.com/php/combined_tool.php"&gt;here&lt;/a&gt;.  Please comment or email me if you find any problems.  Also, the 2008 player cards have been updated.  Because my classification code is taking a very long time to run (10 hours now) there won't be daily updates of the player cards but hopefully twice a week.&lt;br /&gt;&lt;br /&gt;Enjoy.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2008/06/2008-web-based-tool-is-now-available.html' title='2008 Web based tool is now available'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=8366632813655855615' title='0 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/8366632813655855615'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/8366632813655855615'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-1325991133083084917</id><published>2008-05-20T15:58:00.002-04:00</published><updated>2008-05-20T18:56:37.973-04:00</updated><title type='text'>2008 corrections are now online!</title><content type='html'>Yes that is right, after just one set of interleague series I now can run my correction code on the 2008 data.  The player cards have now been updated with the corrections and my pitch classifications.  These classifications are not perfect at the moment as the 2008 corrections still aren't quite as strong as the 2007 corrections but they appear to be better than the MLBAM corrections which are done on the fly (and therefore are much harder to do).  Automated updates will be down for a few days but expect them to be back in full force by the weekend.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2008/05/2008-corrections-are-now-online.html' title='2008 corrections are now online!'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=1325991133083084917' title='2 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/1325991133083084917'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/1325991133083084917'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-2701084685828016770</id><published>2008-05-14T18:51:00.000-04:00</published><updated>2008-05-14T18:53:48.787-04:00</updated><title type='text'>PSA to all PITCHf/x guys</title><content type='html'>As of the 12th MLBAM has changed the header for their data.  It used to look like this:&lt;br /&gt;&lt;br /&gt;atbat num="1" b="4" s="2" o="0" batter="435065" pitcher="458567" des="Reggie Willits walks.  " stand="L" event="Walk"&lt;br /&gt;&lt;br /&gt;Now it looks like this:&lt;br /&gt;&lt;br /&gt;atbat num="1" b="3" s="1" o="1" batter="408299" stand="R" b_height="6-0" pitcher="435043" p_throws="L" des="Omar Infante grounds out, shortstop Brian Bixler to first baseman Adam LaRoche.  " event="Ground Out"&lt;br /&gt;&lt;br /&gt;Make sure to change you parser accordingly.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2008/05/psa-to-all-pitchfx-guys.html' title='PSA to all PITCHf/x guys'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=2701084685828016770' title='0 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/2701084685828016770'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/2701084685828016770'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-6993696725583890194</id><published>2008-05-13T19:04:00.002-04:00</published><updated>2008-05-13T19:20:18.090-04:00</updated><title type='text'>Are there issues with the 2007 data corrections?</title><content type='html'>Recently at the &lt;a href="http://sportvision.com/events/pfx.html"&gt;PITCHf/x summit&lt;/a&gt; &lt;a href="http://ikehall.blogspot.com/"&gt;Ike Hall&lt;/a&gt; presented a talk on data corrections where he noted that it appears that my corrections are over correcting the data on off speed pitches.  You can find the talk on the summit's website it is labeled Data_Improvement.pdf.  On page 11 Ike shows a plot of the differences between the drag coefficient at Comerica and PETCO parks.  While the data seems to great in the fastball region it appears to differ from 0.3 to 0.5 for the drag coefficient for pitches thrown near 60 MPH.&lt;br /&gt;&lt;br /&gt;Now this seems like a huge issue.  Obviously there a very large difference between 0.3 and 0.5.  The problem is difference in drag actually results in a very small effect.  If you assume the air density to be 1.2 pascals, the balls initial velocity to be 60 MPH (26.8 m/s), the circumference of a baseball to be 9 inches (area 0.004 m^2), then you can &lt;a href="http://en.wikipedia.org/wiki/Drag_coefficient"&gt;calculate the drag force&lt;/a&gt;.  If you do that you get the drag force between and 0.52 and 0.86 N.  If you then want to say want to find the differences in final velocity you can use the &lt;a href="http://en.wikipedia.org/wiki/Equations_of_motion"&gt;equations of motion&lt;/a&gt; and if you assume the ball takes 0.5 second (which is actually quite large) you get a difference in final velocities of less than a third of a meter per second.&lt;br /&gt;&lt;br /&gt;So while it appears that my corrections are indeed over correcting the data the results of these over corrections are small.  That said, I will be looking in to how to adjust for this to fix my corrections but don't expect a huge change to the data.  Once I get that worked out I should be ready to run the corrections on the 2008 data as inter league play is nearly upon us and that is what my code really needs.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2008/05/are-there-issues-with-2007-data.html' title='Are there issues with the 2007 data corrections?'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=6993696725583890194' title='1 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/6993696725583890194'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/6993696725583890194'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-1569041499086801712</id><published>2008-04-19T20:53:00.002-04:00</published><updated>2008-04-19T21:03:55.675-04:00</updated><title type='text'>Ok another attempt at daily updates</title><content type='html'>Wow was that a mess but I think I might finally have all the bugs worked out for daily updates to the 2008 player cards.  In any case it is clear now that 2008 data corrections will be needed.  There were some hints of trouble with the data earlier but now the cameras in Cincinnati are just clearly messed up.  Take a look at &lt;a href="http://baseball.bornbybits.com/2008/Aaron_Harang.html"&gt;Aaron Harang's card&lt;/a&gt;.  Two starts at home, two starts on the road and the data is split.  &lt;a href="http://baseball.bornbybits.com/2008/Ben_Sheets.html"&gt;Ben Sheets&lt;/a&gt; just made a start there and you can see the effect there as well (no it wasn't his injury that caused the data to be that skewed).&lt;br /&gt;&lt;br /&gt;So corrections will have to be made again.  I am planning on running corrections like 2007 but for the data set to become complete interleague games will have to occur.  I could do one correction for the AL and another for the NL but my code really isn't setup for that.  We will see.  I will keep you posted.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2008/04/ok-another-attempt-at-daily-updates.html' title='Ok another attempt at daily updates'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=1569041499086801712' title='2 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/1569041499086801712'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/1569041499086801712'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-5337810792004562605</id><published>2008-03-31T10:07:00.005-04:00</published><updated>2008-03-31T13:51:46.131-04:00</updated><title type='text'>2008 Player Cards</title><content type='html'>Well the new season is upon us and even though I haven't updated this blog in ages I do have a treat for anyone who happens to stumble on this blog.  The 2008 player cards are here!  Just click here or the player cards link to the right and off you go.&lt;br /&gt;&lt;br /&gt;A few notes about the new cards.  First, absolutely no corrections have been done to the data.  Everything is straight from &lt;a href="http://sportvision.com/"&gt;Sportvision&lt;/a&gt;.  Second, Sportvision was kind enough to add in pitch types to the 2008 data so instead of running my pitch identifying code I am just using their pitch types.  Third, I have lowered the number of pitches necessary to have a player card from 100 to 10 at least for the beginning part of the season.  This applies to both batters and pitchers.  Lastly, while I had to upload today's batch by hand tomorrow's should be automatically uploaded so you should have completely up to date player cards at your disposal.  Enjoy!</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2008/03/2008-player-cards.html' title='2008 Player Cards'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=5337810792004562605' title='11 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/5337810792004562605'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/5337810792004562605'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-6724007245021479853</id><published>2007-12-03T18:46:00.000-05:00</published><updated>2007-12-04T09:36:13.252-05:00</updated><title type='text'>Web based PITCHf/x tool help/comment page</title><content type='html'>Here is the help/comment page for the &lt;a href="http://baseball.bornbybits.com/php/combined_tool.php"&gt;web based PITCHf/x tool&lt;/a&gt;.  If you have any comments please add them to the bottom of this post.&lt;br /&gt;&lt;br /&gt;First, let me just make sure everyone is aware of what the PITCHf/x system is.  PITCHf/x by &lt;a href="http://www.sportvision.com/"&gt;sportvision&lt;/a&gt; is a system of tracking the ball as it travels to home plate with two cameras.  The cameras take a bunch of pictures of the ball in flight and then sends the data to MLB who puts it &lt;a href="http://webusers.npl.uiuc.edu/%7Ea-nathan/pob/tracking.htm"&gt;online for users to see&lt;/a&gt;.  The data is in a messy form and needs &lt;a href="http://www.baseball.bornbybits.com/blog/2007/11/explanation-of-correction-code.html"&gt;corrections&lt;/a&gt; and pitch &lt;a href="http://www.baseball.bornbybits.com/blog/2007/11/classifcation-algorithm-explained.html"&gt;classifications&lt;/a&gt; before the data can really be used.  That is why I made the web based tool for anyone to use.&lt;br /&gt;&lt;br /&gt;So the first few things the tool will ask you for are simple things like the name of the pitcher and batter.  The only restriction is you must put in either a pitcher or a batter (or put in both).  Sadly, less than a quarter of all pitches were tracked this year so if you put in certain pitcher/batter match ups it will come back with no results.  If that happens please try again.&lt;br /&gt;&lt;br /&gt;After you have entered the pitcher/batter it will ask you for the type of pitch, the result of the pitch, and the count.  When you start try leaving these blank to see what a certain pitcher throws then you can go back and focus on only one type of pitch for instance.  If you feel a certain pitcher stuff isn't being represented correctly please comment below.&lt;br /&gt;&lt;br /&gt;Next, options are available to cut on things like pitch speed, and horizontal and vertical movement.  All horizontal measurements have negative numbers as moving in towards a right handed batter.  Speed is measured in MPH and movement in inches.&lt;br /&gt;&lt;br /&gt;Lastly,  either the location of the pitch or the break of the pitch is shown.  The location is simply where the ball crossed home plate.  The break is how the ball moved in comparison to a ball thrown without spin.  So if there was no spin the ball would end up at (0,0) on the graph.&lt;br /&gt;&lt;br /&gt;Please note that sometimes the image posted will be the previous inputed image from your web browser's cache.  This is because it takes the tool a few seconds to produce it's result and some times your impatient browser will just use the previous image.  If this happens please press reload.&lt;br /&gt;&lt;br /&gt;There are still a few issues including not allowing you to cut on the date.  This is something that will be including but I am having trouble with it in my database.  Also, I am having a bit of trouble with the spin and direction so sorry that didn't make it.  It will be coming soon though.  Also, the release point will become an option to plot and cut on and an extended table with some league averages will be in the next version.  Lastly, the biggest issue is when you make a selection and it run it doesn't store your selection to allow you to alter your query quickly.   This is very annoying but hard to fix on my end.  I'll have a solution by the next update.  If you press the back button hopefully your browser will remember your options but that is an imperfect solution.&lt;br /&gt;&lt;br /&gt;If you would like to use any of the plots you make go ahead just add a link to the tools webpage, this page, or the &lt;a href="http://www.hardballtimes.com/main/article/anatomy-of-a-player-tim-lincecum/"&gt;hardballtimes&lt;/a&gt; article.&lt;br /&gt;&lt;br /&gt;A big thanks to my beta testing team Mark (&lt;a href="http://tigstown.com/"&gt;TigsTown.com&lt;/a&gt;), Lee (&lt;span style="color: rgb(136, 136, 136);"&gt;&lt;a href="http://www.detroittigertales.blogspot.com/" target="_blank"&gt;www.detroittigertales.blogspot&lt;wbr&gt;.com&lt;/a&gt;)&lt;/span&gt;, and the guys at &lt;a href="http://nomaas.org/"&gt;&lt;span style="text-decoration: underline;"&gt;nomaas.org&lt;/span&gt;&lt;/a&gt;.  Sorry for the slow posting of this.  Hopefully the next version will be available before Christmas.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/12/web-based-pitchfx-tool-helpcomment-page.html' title='Web based PITCHf/x tool help/comment page'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=6724007245021479853' title='14 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/6724007245021479853'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/6724007245021479853'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-7873214070035877497</id><published>2007-11-12T13:33:00.000-05:00</published><updated>2007-11-12T13:52:59.326-05:00</updated><title type='text'>Classifcation Algorithm Explained</title><content type='html'>Once the data has been &lt;a href="http://www.baseball.bornbybits.com/blog/2007/11/explanation-of-correction-code.html"&gt;corrected&lt;/a&gt; we are ready to start classifying the pitches.  But first there is a little trick I want to apply.  Because the atmospherics can reduce the spin on the ball up to 25% on a hot day at Coors I translate each pitch like it was thrown at sea level at standard temperature (59 degrees Fahrenheit).  This is sort of like applying a park factor to correct for runs scored and puts each pitch on a level playing field.  This is very important for the classification algorithm because if these pitches weren't translated pitchers who spent half of their time at Coors would have two separate curve balls.  This would really mess the algorithm up and while Coors is the biggest problem some other parks during mid summer or during a cold spell can have a higher than 10% change as well.  Translating these pitches solves these problems.&lt;br /&gt;&lt;br /&gt;Ok so now the pitches are translated we are ready to classify them.  I am using an incredible simple algorithm that clusters pitches by determining how close a pitch was to every other pitch thrown by that pitcher.  It calculates a "distance" between each pair of pitches by comparing the speed the pitch was thrown at and the vertical and horizontal accelerations.  The two pitches that are closest together get merged.  This process continues until all pitches are in clusters and the clusters are far enough away from each other.&lt;br /&gt;&lt;br /&gt;Once the clusters are formed the algorithm finds the pitcher's fastball.  It does this by simply taking the cluster that has the highest speed.  Once the fastball is found every other cluster is compared to the fastball in speed and the two accelerations.  Now the cluster algorithm is run again on the remaining clusters and pitch types are formed.  By first comparing the pitches to the pitchers fastball Jamie Moyer's other pitches can be on the same footing as Joel Zumaya's pitches.  The algorithm can't say these are curve balls but it can put all the curve balls&lt;br /&gt;together and then I can label the group curve balls.  Once this is done it goes back to the fastballs we started with and reclassifies those in case a pitcher only throws sinkers or cutters for example.&lt;br /&gt;&lt;br /&gt;Sadly, this algorithm is far from perfect and needs some human intervention.  I have to hand edit about 40 pitchers who might have a splitter that looks like a sinker to the algorithm or a slider that looks like a cutter and so on.  I have tried to check other references to make sure I have the right pitches for each pitcher but for many pitchers who have just thrown a few pitches in the big leagues this is particularly hard.  If you are browsing the player cards and find something you think I got wrong please leave a comment below.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/11/classifcation-algorithm-explained.html' title='Classifcation Algorithm Explained'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=7873214070035877497' title='10 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/7873214070035877497'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/7873214070035877497'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-659700736577023993</id><published>2007-11-12T12:40:00.000-05:00</published><updated>2007-11-12T13:32:48.663-05:00</updated><title type='text'>Explanation of the correction code</title><content type='html'>This post is way overdue but finally here is a detailed explanation of the correction code to the PITCHf/x data.  As we have &lt;a href="http://www.baseball.bornbybits.com/blog/2007/09/preliminary-correction-to-pitchfx-data.html"&gt;seen&lt;/a&gt; in previous posts, the PITCHf/x data needs some serious corrections.  This is going to be a pretty hard core post so feel free to skip it if you aren't interesting in the method or how to correct the data.  I am going to describe the process for one variable, the initial position of the ball in the vertical position, or z0.  After that I will discuss alternations for other variables.&lt;br /&gt;&lt;br /&gt;Once I have all the data read in and all initial positions are moved back to 55 feet from home plate I am ready to correct the data from park to park.  What we would really like to do is first calculate a league average and then calculate how each park varies from that.  But because the nature of the data this is impossible.  For instance, if a home team has a very short pitching staff that park is going to have a low average z0 if we simply averaged all the pitches thrown in the park.  Having a park average for each park is essential for the league average calculation so we must do something else.&lt;br /&gt;&lt;br /&gt;What I have come up with is instead of calculating an average I am calculating the difference between two parks based off common pitchers to each park.  I first calculate a mean and a variance for z0 for each pitcher for each park he has pitched in.  I then take every pitcher who has thrown a tracked pitch in park A and park B and calculate the difference between the two means from the two parks.  I also carry out a similar trick by adding the square of the variances to find the error on this difference.  So, if a pitcher had a mean of 6 feet in park A and a mean of 6.5 feet in park B than his difference would be -.5 feet.  Once I have done this for every pitcher who has thrown in the two parks I can add up the differences.  But, because some pitchers contributed a lot of pitches in both parks and some just a few I actually find a weighted average.  This is were the error comes in for each pitcher in the differences.  If a pitcher just threw a few pitches in both parks he is going to have a very large variance and won't count as much to the weighted mean.&lt;br /&gt;&lt;br /&gt;So this should give me a park difference between every park.  The problem is there are many park combinations that no pitcher threw in both parks while PITCHf/x was tracking.  To solve this problem I carry out the above procedure to higher orders.  I do that by adding intermediary parks.  So instead of going straight from park A to park B I also add in pitchers who threw pitches in park A and park C and then pitchers who threw in park B and park C.  Now because park C has been added we have two sets of errors which again we need to combine in quadrature which means this measurement will be less accurate than just going from park A to park B but it is the only solution for parks with no common pitchers.  In fact, I carry this procedure out to 4th order to get the best possible results.  I could go further but I have found that 5th order and beyond change the numbers less than 1/2 a percent.  Needless to say, this takes a long time.  Hours in fact on my desktop.  But the result is I now have a difference between all the parks.  From now on I will call the difference between park A and B D(A)(B).&lt;br /&gt;&lt;br /&gt;I now have all of the differences but this doesn't get me any closer to the league average.  In fact, we will now apply a nifty statistics trick.  While I would really like to find the league average I don't actually need it.  What I really need is the difference between each park and the league average.  I will also note park A's average as PA.  Again, I can't actually find this number but we will need it in the difference between each park and league average calculation.  Here is how we are going to find that.&lt;br /&gt;&lt;br /&gt;By definition, the league average would be the sum of each park divided by the number of parks.  Multiplying each side of that equation by the number of parks and we get.&lt;br /&gt;&lt;br /&gt;P1+P2+P3+...+P28+P29 = LgAve * 29&lt;br /&gt;&lt;br /&gt;Note we are using 29 here because the system was never turned on in Baltimore.  Also, the numbers 1 through 29 are just placeholders for each of the parks.  If we want to now find the difference between park 1 and league average we can start by adding P1-P2 to both sides.&lt;br /&gt;&lt;br /&gt;2*P1+P3+....+p28+P29 = LgAve*29 + P1 - P2&lt;br /&gt;&lt;br /&gt;We have got P2 out of the right side which is good but now it is on the right side which is bad.  The good news is we know what P1 - P2 is that is D(1)(2) which we already have measured.  In fact, I now can add P1 - P3 and P1 - P4 and so on to each side and then replace each difference on the right side with the corresponding D until I get:&lt;br /&gt;&lt;br /&gt;29 * P1 = LgAve * 29 +D(1)(2) + D(1)(3) + ... + D(1)(28) + D(1)(29)&lt;br /&gt;&lt;br /&gt;Moving the LgAve to the left side and dividing by 29 we get:&lt;br /&gt;&lt;br /&gt;P1-LgAve = (D(1)(2)+D(1)(3) + ... D(1)(28)+D(1)(29))/29&lt;br /&gt;&lt;br /&gt;The left side is exactly what we want, the difference between one park and league average.  The right side are all numbers which we have calculated.  So we can apply this method for each park and just like that we have the park corrections for the initial vertical release point.&lt;br /&gt;&lt;br /&gt;Whew, we now need to do this method for each park for each variable.  That is all the initial locations, the initial velocities, and the accelerations.  The accelerations are a little bit complicated because they also are affected by the atmospheric conditions.  For them I find the altitude and temperature of the game and find the air density.  Because the ball is being manipulated by drag and spin (Magnus force) and both forces are proportional to air density I can multiply in the air density then run the correction code.  This actually gives me the correction factor times the density but I can divide that out when I go to apply it.&lt;br /&gt;&lt;br /&gt;Lastly, the z direction acceleration needs another trick.  Gravity is also acting on the ball but it doesn't care about air density.  So it must be subtracted first.  Once the correction factor is found gravity can be added back in to find the true acceleration in the z direction.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/11/explanation-of-correction-code.html' title='Explanation of the correction code'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=659700736577023993' title='3 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/659700736577023993'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/659700736577023993'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-5513066990376268403</id><published>2007-10-17T20:50:00.000-04:00</published><updated>2007-11-05T11:18:13.019-05:00</updated><title type='text'>WE by count</title><content type='html'>Again using the Monte Carlo from the intentional walk tool and some data from baseball-reference on the league average OPS by count, I have cobbled together a WE chart by count.  I am planning on using these numbers as the baseline for the intentional walk tool if the count isn't 0-0.  But I thought people might be interested in this table stand alone so I have uploaded it &lt;a href="http://baseball.bornbybits.com/statistics/WPA.html"&gt;here&lt;/a&gt;.  Warning, it is a really huge file and probably will take several minutes to load on your computer.&lt;br /&gt;&lt;br /&gt;A big thanks to John Walsh for pointing me to exactly the data I needed as sometimes this whole interweb is just too confusing.&lt;br /&gt;&lt;span style="text-decoration: underline;"&gt;&lt;/span&gt;&lt;br /&gt;Edit: Also a big thanks to tmapress and tangotiger for correcting my WPA WE error.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/10/wpa-by-count.html' title='WE by count'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=5513066990376268403' title='9 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/5513066990376268403'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/5513066990376268403'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-9055124959439007079</id><published>2007-10-16T18:44:00.000-04:00</published><updated>2007-10-17T10:22:34.310-04:00</updated><title type='text'>Future Leverage Index</title><content type='html'>The next thing I wanted to tackle with my Monte Carlo from &lt;a href="http://www.baseball.bornbybits.com/blog/2007/10/intentional-walk-tool.html"&gt;yesterday&lt;/a&gt; was a new metric I am going to call Future Leverage Index, or FLI.  &lt;a href="http://www.insidethebook.com/li.shtml"&gt;Leverage index&lt;/a&gt; (LI) and &lt;a href="http://www.hardballtimes.com/main/article/the-one-about-win-probability/"&gt;WPA&lt;/a&gt; are great for telling you what the current situation is, and how important the next batter is, but what about one or two batters down the line?  Many people (including me) have screamed at the TV (or monitor if you are watching with MLB.tv like me) when a crappy reliever is left in the game to face an extremely high leverage situation while the closer blow bubbles with his bubble gum in the bullpen.  One of the problems managers have though is relievers need time to warm up so they have to look into a crystal ball to try to determine what the situation will be in a few batters.  This is where FLI comes in.&lt;br /&gt;&lt;br /&gt;FLI starts with the situation inputed and then runs two league average batters just like it did for the intentional walk tool.  Only this time instead of weighting the results with WPA it weights the results with LI.  If the half inning ended the FLI is assumed to be zero because you have plenty of time to warm up a new pitcher while you are batting.  Averaging all the possible outcomes and you get the FLI for that situation.  So like LI, a FLI near zero means it is unlikely that the situation in two batters will reach crisis.  As the FLI goes up, the more and more likely a very important situation will occur.  A FLI of anything above three probably means you should get your closer up NOW.&lt;br /&gt;&lt;br /&gt;Here is a link to a nice &lt;a href="http://baseball.bornbybits.com/statistics/FLI.html"&gt;table&lt;/a&gt; that includes WPA, LI,  and FLI.  A few interesting situations that I would like to point out.  First, the highest leverage situation in baseball is bottom of the ninth, two outs, with the bases loaded in a tie game.  This checks in with a LI of over ten but has the lowest possible FLI of zero.  Why is this?  Because either the batter reached base and the game is over or he made an out and the inning ended.  In fact, if you look at most two out situations you will see that they tend to have very low FLIs.  This is because even if the pitcher is struggling, it is likely that one of the next two batters will make an out and end the inning.  This means that you really should be warming up your important arms early in the inning, not late.  Also, if you have a lead and a few runners get on base, and it is getting late in the game, now is the time to warm up your closer.  Top of the eighth with a two run lead and runners on first and second checks it at a FLI of 2.9.  That is the same leverage as a one run lead starting the top of the ninth.  If you are behind by a run though FLI drops like a rock.&lt;br /&gt;&lt;br /&gt;Hopefully, the end product of this will be something that is combined with the intentional walk tool to become a situational tool.  Plug in the situation and the players involved and it will tell you if you should walk the next batter or to start your closer warming up or whatever.  That is still probably weeks away but this should be a good step in that direction.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/10/future-leverage-index.html' title='Future Leverage Index'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=9055124959439007079' title='5 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/9055124959439007079'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/9055124959439007079'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-6484896734782105765</id><published>2007-10-15T18:37:00.000-04:00</published><updated>2007-10-16T09:32:23.382-04:00</updated><title type='text'>Intentional Walk Tool</title><content type='html'>This post is going to be a nitty gritty explanation of the intentional walk tool I wrote. If you are coming from HardBallTimes and want a detailed explanation you are in the right place.  If you got here elsewhere and just want a quick rundown check my HardBallTimes article &lt;a href="http://www.hardballtimes.com/main/article/to-walk-or-not-to-walk/"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So the tool is actually two different programs put together.  The first tool is a &lt;a href="http://en.wikipedia.org/wiki/Monte_Carlo_method"&gt;Monte Carlo&lt;/a&gt; (MC) simulation of a baseball at bat.  What it does it is runs 10,000,000 trials for each at bat for every possible baseball situation.  There are eight possible runner situations (e.g. runners on second and third) and three out situations (no outs, one out, and two outs) and it does this for an OPS of 400, 500, 600, 700, 800, 900 and 1,000.   The OPS here stands for the expected OPS the batter would put up against this pitcher and defense. This gives us 168 possible combinations.  The program then calculates the probability of the at bat ending in every other situation (e.g. the chance that a runner on second with one out before the at bat ends with a runner on third with two outs, etc).  For runner advancement, error rate, and other goodies it uses 2007 data and assumes all runners are league average.  It then draws a best fit line through each situation given the OPS.  This slope and y-intercept are then recorded to a lookup table.&lt;br /&gt;&lt;br /&gt;To test that the MC is performing correctly we can run one batter for each situation and compare the resulting run expectancy to the 2007 run expectancy. I would expect that the MC should be close to the real run expectancy and it does appear close.  It is hard to say how close it should be and how much variation in the run expectancy matrix is variation.  Anyway, here are the numbers:&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/WPA-770411.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/WPA-770409.gif" alt="" border="0" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;This lookup table then contains all the information you need to find the chance of a runner on second advances to third with one out if the batter has an OPS of say 761 (league average this year).  These lookup tables take hours to create but once they are done accessing them is incredible quick.  Now we are ready for the second part which is the program that puts all of the information together.&lt;br /&gt;&lt;br /&gt;This program, which takes as input a web form, starts with a baseball situation.  It then calculates every possible new situation after one at bat and the probability of that new situation.  Do this three more times and you have a resulting matrix that holds what the game looks like four batters in the future, weighted by the probability of that situation.  If the inning ended the opposing offense is allowed to bat and the game tree expands again weighing all the chances that team scored zero runs, or one run, etc...  Once this is done each situation is multiplied by the WPA of that  situation.  Sum them all up and you get the new, corrected WPA of the original situation.  If the game ended along the way the winning team is assigned a 100% WPA for that situation.  The WPA values used are from Dave Studeman and Jon Daly's WPA calculator.  My understanding is they used 2005 data to create their charts.  If you put in a league average batters it should return their WPA but because of error in the MC and the difference between 2005 and 2007 data the MC tends to be off by less than 0.5%.  There are a few instances when this rises to nearly a 1% difference though.&lt;br /&gt;&lt;br /&gt;Obviously, this MC is not perfect.  Hopefully it is a step forward though.  Also note that I am planning on creating a better MC by using a more accurate input than OPS.  I started with OPS because it was pretty easy to program and it would be easier to use for the general public but I a more accurate one will be coming soon.&lt;br /&gt;&lt;br /&gt;Whew.  Well hopefully that makes at least a little sense.  If it doesn't please feel free to comment below or email me under my profile to the right.  You can access the tool either at article at &lt;a href="http://www.hardballtimes.com/main/article/to-walk-or-not-to-walk/"&gt;THT&lt;/a&gt; or by following this &lt;a href="http://www.baseball.bornbybits.com/IBB_tool.html"&gt;link&lt;/a&gt;.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/10/intentional-walk-tool.html' title='Intentional Walk Tool'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=6484896734782105765' title='2 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/6484896734782105765'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/6484896734782105765'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-1269653013269414783</id><published>2007-10-04T14:20:00.000-04:00</published><updated>2007-10-04T15:09:32.548-04:00</updated><title type='text'>Was Manny Corpas Doctoring The Ball?</title><content type='html'>After my first &lt;a href="http://www.hardballtimes.com/main/article/a-closer-look-at-jeff-francis-start-against-the-phillies/"&gt;article on Hardball Times&lt;/a&gt; reader Tom who blogs &lt;a href="http://www.ballssticksstuff.com/"&gt;here&lt;/a&gt; emailed me saying that some TBS camera caught Manny Corpas &lt;a href="http://www.philly.com/philly/hp/sports/20071004_Umps__keep_an_eye_on_that_Rockie.html"&gt;pouring Gatorade on his jersey&lt;/a&gt; before coming in to pitch and then kept reaching for that spot when he was on the mound.  Was Corpas wetting the ball in effect throwing a spit ball?  Tom wanted me to look into it so I downloaded the 13 pitches Corpas threw and ran them through my algorithms correcting for everything but &lt;a href="http://www.baseball.bornbybits.com/blog/2007/09/corrections-to-corrections.html"&gt;home stand calibration corrections&lt;/a&gt;.  I didn't use those corrections because only one game has been played in the series and it is hard to tell from that how what the PITCHf/x calibration really was.  These corrections are very small anyway so they most likely wouldn't make a difference.  Here is Corpas' &lt;a href="http://baseball.bornbybits.com/plots/Manny_Corpas.html"&gt;player card&lt;/a&gt; for full details.&lt;br /&gt;&lt;br /&gt;Anyway, I decided to overlay Corpas' regular season data as well to see if his pitches yesterday were doing anything different from the regular season.  Because it is only 13 pitches be warned, we have an incredibly tiny sample here.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/corpas-758693.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/corpas-758689.gif" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;It does appear that Corpas was throwing his Slider a little more often than he did in the regular season but because we saw Francis do a similar thing that appears to be the Rockies game plan.  As far as to whether the ball was breaking any differently it doesn't appear to be.  If you look closely you might see that Corpas was getting less vertical movement then normal but overall it appears well within the range that he can pitch.  We saw the Francis was trying to keep the ball down as well and it could be those were Corpas' instructions as well.  His speed was pretty much right in line with his norms as well.  His Fastball was a tick above 93 MPH and his Slider was a tick above 78 MPH.  So if Corpas was doctoring the ball, it doesn't look like it had any effect.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/10/was-manny-corpas-doctoring-ball.html' title='Was Manny Corpas Doctoring The Ball?'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=1269653013269414783' title='2 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/1269653013269414783'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/1269653013269414783'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-4260759137751860268</id><published>2007-10-02T12:19:00.000-04:00</published><updated>2007-10-02T12:27:39.958-04:00</updated><title type='text'>Season Ending Data</title><content type='html'>Except for the one game playoff yesterday the player cards and the defensive metrics should be completely up to date.  I have added some splits now to the player cards including right/left split and a breakdown by count.  I also have added an explanation page for the player cards that should be linked off of every card.  Lastly, I fixed the strike zone size as pointed out by Ike.  Thanks Ike for spotting that.  If you have any questions or corrections feel free to comment below.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/10/season-ending-data.html' title='Season Ending Data'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=4260759137751860268' title='3 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/4260759137751860268'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/4260759137751860268'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-4439185191816141380</id><published>2007-09-27T16:40:00.000-04:00</published><updated>2007-09-27T16:50:49.730-04:00</updated><title type='text'>Player Cards for Batters Now Available</title><content type='html'>I know I promised a post with a full explanation of my code but things just keep changing and I am going to hold off until I am really happy with everything to make that post.  So again I have messed with my clustering algorithm and added a couple of dozen hand edits on top of that mostly changing classifications of sliders/cutters and splitters/sinkers.  I think things are getting close as probably about 95% of pitches are correctly being classified.&lt;br /&gt;&lt;br /&gt;This has allowed me to then run the data through another plot maker to make player cards for batters as well.  Again, I require 100 pitches seen by PITCHf/x to qualify.  I have added the most recent team a player has played for and what hand he throws with.  Sadly some players, like &lt;a href="http://baseball.bornbybits.com/plots/bat/Adam_Dunn.html"&gt;Adam Dunn&lt;/a&gt;, bat with the opposite hand than they throw with.  I'd love to add if they bat left handed, right handed, or are a switch hitter but that data doesn't seem to be easily available in the files I am grabbing from MLB.  So I am going to have to grab some other files and cross reference to get that.  We will see when I get around to that.&lt;br /&gt;&lt;br /&gt;There are some more things I am planning on adding to the player cards like contact rate and how often they swing at balls but to make those numbers meaningful I need to find a league average first.  I also am going to be comparing pitchers this way.  Hopefully an update with that will come this weekend.&lt;br /&gt;&lt;br /&gt;As always, if you see something you think is wrong or something you would like to see added, or a design thing you would like to see changed leave a comment below.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/09/player-cards-for-batter-now-available.html' title='Player Cards for Batters Now Available'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=4439185191816141380' title='4 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/4439185191816141380'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/4439185191816141380'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-7338327437101114028</id><published>2007-09-20T18:29:00.001-04:00</published><updated>2007-09-21T10:22:40.279-04:00</updated><title type='text'>Brand New Player Cards</title><content type='html'>Well I was able to get things finished a bit early so here are some brand new &lt;a href="http://baseball.bornbybits.com/plots/players.html"&gt;player cards&lt;/a&gt;.  I am planning on making a big post about exactly how these plots are produced this weekend but for now let me just say that I am correcting the initial position/velocity of the ball and the accelerations (and a big thanks to Dr. Nathan for finding a mistake in my code that corrects the acceleration).   This way I can properly combine home and road data.  Once that correction is done I am pretending each ball was thrown at sea level at standard temperature.  This is done so my classification code can look at each pitch on the same footing and then determine what type of pitch it was.&lt;br /&gt;&lt;br /&gt;The classification code still needs some work.  It is getting better but still will miss-classify some pitches as belonging to a wrong group or incorrectly classify a group.  Maybe a better way of thinking about the two possible errors is the first error is when a pitcher throws both a slider and a curve and a certain pitch got labeled a slider when it should have been labeled a curve.  The second error is when a group of pitches are labeled as sliders when really every one of them is a curve.  I am still trying to adjust this code but it is getting better.  The only real big hole right now is it is not calling any group of pitches split fingered fastballs.  It also is having some issues with side armed pitchers and if they are &lt;a href="http://fastballs.wordpress.com/2007/09/14/in-the-land-of-submariners/"&gt;throwing two or four seamed fastballs&lt;/a&gt;.  I am working to correct that but I thought it would still be useful to show what I currently have.  The last time I did this I got some excellent reader response on what pitches the clustering algorithm was getting wrong and I'd like to ask for your assistance again.  Take a look through the player cards and either comment below what is incorrect or email me (my address is under my profile to the right).&lt;br /&gt;&lt;br /&gt;Even if you don't find any pitches that are being classified incorrectly if there is something you don't like about the presentation of the player cards, or something you would like to see added, again please let me know.  This really is my first attempt at something like this and I am not incredibly handy with html so feel free to suggest an alternate way of doing something.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/09/brand-new-player-cards.html' title='Brand New Player Cards'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=7338327437101114028' title='4 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/7338327437101114028'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/7338327437101114028'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-3386393643456662739</id><published>2007-09-19T13:52:00.000-04:00</published><updated>2007-09-19T17:37:23.532-04:00</updated><title type='text'>Corrections to the Corrections?</title><content type='html'>Mike Fast made an interesting comment in my last post.&lt;br /&gt;&lt;blockquote&gt;One other thing I had been wanting to ask you about...when you calculated the corrections to the x0, z0 initial point, did you assume that each park had a single correction factor that did not vary with time? I noticed with Papelbon that his data was quite different between two different Boston homestands, and Dr. Nathan mentioned to me that the PITCHf/x system is typically recalibrated between homestands.&lt;/blockquote&gt;&lt;br /&gt;I haven't known Mike for long (and really how much to you ever know someone from reading blogs?) but I do know that when he says something, it is worthwhile to look in to.  I had assumed that things in each home park stayed pretty much the same.  Pretty much everywhere I look people have been making plots combining all home data.  This is something I probably should have looked at earlier but better late than never.  So the question is, do we need to add a daily (or home stand) correction to home parks?&lt;br /&gt;&lt;br /&gt;To start, lets look at a few pitchers initial release point from game to game and see how things look.  Even though Mike mentioned Papelbon I am going to start by looking at Jake Peavy.  The PITCHf/x system was installed from day one in San Diego and Peavy has been a workhorse for them.  Here is Peavy's vertical release point by date.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/peavy_y-789573.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/peavy_y-789571.gif" alt="" border="0" /&gt;&lt;/a&gt;I have added Peavy's road starts in just to give an idea of what kind of error you can expect from park to park.  It looks like there is some variation in Peavy's release point as time goes on in his home starts.  That variation is less than the variation you see in the road parks but it is there.  That said, you can see the wide spread of his release point in game and that spread is larger than what the difference game to game is.  By the way, I have removed pitches with speed less than 60 as Joe P. Sheehan suggested but I still see some pathological points.  I am not quite sure what to do to remove these right now.  The plot looked much worse before I made the cut on speed though so I do believe that is at least helping.   What about his horizontal release?&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/peavy_x-789669.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/peavy_x-789667.gif" alt="" border="0" /&gt;&lt;/a&gt;This looks maybe a little worse than the vertical release.  Even though Peavy is a right hander I changed the sign to report positive numbers here.  Maybe there is some trend towards bringing his release point in closer to his body?  Is that an adjustment, or is that from PITCHf/x getting recalibrated or is that just some random noise?  With Peavy not really providing the answers lets turn our attention to Papelbon.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/pap_y-711848.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/pap_y-711846.gif" alt="" border="0" /&gt;&lt;/a&gt;Papelbon being a reliever has stretches of getting into multiple games back to back.  The last two games on the way right are Sept 12th and Sept 14th and the blob just left of that was a three game stretch from Spet 2nd to the 4th.  PITCHf/x wasn't installed in Fenway or many AL East stadiums until recently so we don't have a ton of data to work with.  Fenway is also noted as having one of the worst calibrated PITCHf/x systems which is kind of strange because it was installed relatively late.  You can see Fenway tends to be lower than his road starts which the correction fact finds and maybe the four home days on the left are lower than the four on the right.  Could this be Sportvision realizing Fenway was messed up and recalibrating?  Lets take a look at his horizontal release point.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/pap_x-711959.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/pap_x-711957.gif" alt="" border="0" /&gt;&lt;/a&gt;Ugh this is all over the place.  That nice three game stretch we noted seemed to have very consistent vertical release point but the last day here it appears Papelbon's release was much closer to his body (mechanics breakdown from pitching three straight days?), or maybe he was a step left on the mound from what he normally was, or maybe the system was recalibrated mid series.  If that was the case maybe we would need to do a daily correction to the release point like we have done with the acceleration.  Well, from looking at these plots it doesn't appear we have any definitive answers.  One pitcher just isn't enough, we need to look at the whole staff.  We can't just plot every release point from the home team every day though because some pitchers have very different release points.  We need to find each players average release point and then subtract that from each pitch.  This will show us the actual difference in release point from average for each pitcher which will put them all on the same level and easy to compare.  Everything up until now I have been measuring in feet because these release points are far away from the origin.  These differences are going to be much smaller though so I am going to move to inches to make these difference plots.  Also, because the horizontal direction seems to be worse I will be using that to compare.  Lets start with Fenway.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/Fenway-722817.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/Fenway-722809.gif" alt="" border="0" /&gt;&lt;/a&gt;You now can clearly see the Red Sox home stands and what the differences were for each pitch thrown on each day.  Again notice at how large the in game spread is.  It appears that at least the Boston pitchers are varying their release point by almost a foot during each game.  I've added a grid to make it easier to see how each of these home stands compare with each other and with zero.  If the system was getting recalibrated and that was changing the horizontal release points being measured you would expect to find some home stands higher than zero and some lower than zero.  If you look very closely you can see that maybe the first few home stands are high by an inch and maybe the last two home stands are low by an inch but it is hard to tell.  Maybe looking at a park like Petco which was around from the start would show some move variation.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/Petco-723660.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/Petco-723658.gif" alt="" border="0" /&gt;&lt;/a&gt;The Petco data looks pretty consistent to me.  Again, maybe the home stands in the middle and the one on the far right show a slight increase and the others a slight decrease but that appears to be very small. Interestingly, their second home stand which was very short seemed to have a few pitchers throwing with and increased difference.  That is countered by a single pitcher who was almost a foot below average though.  A few parks had the system installed for one day while ESPN was in town only to have their camera removed and then added again at a later date.  Coors field is one of those and we have seen that system seems too be pretty bad as well so lets look at that data next.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/coors-713285.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/coors-713282.gif" alt="" border="0" /&gt;&lt;/a&gt;Even that first day, almost 80 days before their camera was installed full time, shows remarkable agreement with the rest of the data.  Maybe that day is a little high and maybe the last home stand is as well but that again isn't anything larger than two inches at most.   I have looked through every stadium and through several variables and have seen the same story in each one.  The only stadium that really shows a recalibration changing the data is the horizontal release point at Chase field.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/chase-713390.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/chase-713383.gif" alt="" border="0" /&gt;&lt;/a&gt;This is what I would have expected to see from the other parks if the recalibration was really changing the data.  The first two home stands appear to be about four inches above zero.  The next home stand maybe about two inches above zero.  The last three home stands appear to be two or three inches below zero.  Interestingly, the vertical change appears to much smaller than the horizontal.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/chase_vert-706808.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/chase_vert-706803.gif" alt="" border="0" /&gt;&lt;/a&gt;So what can we conclude?  Well it does appear that Sportvision is recalibrating their PITCHf/x systems between home stands but, in general, those corrections are relatively small.  Chase Field does appear to be an exception though.  My correction factor seems to think that, overall, Chase Field is moving the horizontal release point about four inches to the left (as the catcher sees it).  But it appears that the difference in Diamondback home games alone is about four inches because of recalibration.&lt;br /&gt;&lt;br /&gt;So what should we do about this.  I probably could just adjust Chase Field "by hand" and be done with it but what if a recalibration in another park messes up the data in these last few weeks or even next year?  It sure would be nice to have that automated.  So what I am planning on doing is writing a first correction algorithm that will sit in between my code that parses the data and the code that currently does the corrections.  This code will do a home stand by home stand intra-park correction  and then feed the results to the regular code that will handle the inter-park corrections.  Unfortunately, this will push back the player cards to probably this weekend.  I know I am such a tease, but hopefully making this last correction will really nail things down.  I'd like to thank Mike again for pointing this out.  If you have any comments or concerns with the PITCHf/x data please comment below or email me.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/09/corrections-to-corrections.html' title='Corrections to the Corrections?'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=3386393643456662739' title='2 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/3386393643456662739'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/3386393643456662739'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-2639862500254829827</id><published>2007-09-18T14:07:00.000-04:00</published><updated>2007-09-18T17:40:50.383-04:00</updated><title type='text'>Breakthrough</title><content type='html'>Well basically nothing got done this weekend but I did have a little time Monday while watching baseball to muck with my code and I think I have finally solved the riddle of correcting the acceleration data.  As I noted at the end of &lt;a href="http://www.baseball.bornbybits.com/blog/2007/09/response-to-reader-question.html"&gt;this post&lt;/a&gt; the solution to the problem would not be a nice linear solution like the initial position or the initial velocities were.  As Dr. Alan Nathan points out in his &lt;a href="http://webusers.npl.uiuc.edu/%7Ea-nathan/pob/Analysis.pdf"&gt;analysis&lt;/a&gt; (p. 2), the forces (and, as such, the acceleration) on the ball in flight are definitely not linear.&lt;br /&gt;&lt;br /&gt;This means a non-linear solution will be needed and in particular a solution is going to be needed for every park for every day that a game has been played.  For example, the air density is needed for calculating both the drag and Magnus forces on the ball.  The most widely known example of air density causing a problem is the thin air at Coors field in Colorado.  What isn't as widely known is that the air temperature plays just as big of a roll in &lt;a href="http://en.wikipedia.org/wiki/Air_density"&gt;calculating the air density&lt;/a&gt; as the distance above sea level.  This year the Reds played a game against the Pirates at home where the game time temperature was 30 degrees Fahrenheit.  They played another game against the Braves where the game time temperature was 99 degrees Fahrenheit.  Obviously, the ball is going to behave quite differently in these two situations.  Again, the air density is a non-linear equation so this is going to be a mess.&lt;br /&gt;&lt;br /&gt;I am not going to go into great detail on everything that went into my 700 line C++ program that calculated these corrections because while I think I have everything correct there still might need to be a tweak or two to the code.  Also, I really don't want to bore the readers with four paragraphs of hard core explanation.  If this is something that you the readers want to see, please add a comment at the bottom and I will consider it.  The short version is I modified my code that was used for the linear corrections to make it non-linear and added in some physics equations to find the corrections and only used fastballs for making this comparison.  This is also going to make it very difficult if not impossible for me to properly disseminate the corrections.  I will be thinking about this and hopefully will come up with a solution.&lt;br /&gt;&lt;br /&gt;What I do want to do is show you the results.  For that I am going to use Jeff Francis since he has been in the Rockies rotation the whole year and Colorado does indeed need the largest corrections even after the atmospheric values were taken into consideration.  Instead of showing you break of the ball, like I have before, in these plots I am going to show you the actual accelerations.  Break can be calculated from these values (along with the initial positions/velocities) and again Dr. Nathan has a good explanation of how to do that &lt;a href="http://webusers.npl.uiuc.edu/%7Ea-nathan/pob/tracking.htm"&gt;here&lt;/a&gt;.  The reason I want to show acceleration here is because that is the value that is going to be corrected.  So starting with the uncorrected data here is the x and z accelerations for home and away games while PITCHf/x was on for Jeff Francis.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/accel-749246.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/accel-749243.gif" alt="" border="0" /&gt;&lt;/a&gt;Now maybe I don't know a lot but one thing I do know is the ball should break less (have a smaller acceleration) at Coors than at other parks around the league.  Yet this data appears to be exactly opposite of that.  Obviously, something is messed up with the data.  The blob of data around (-5,-35) is Francis' curve ball and the huge mess to the upper right is a concoction of his fastball and change.  Lets apply the correction factors to both the home and road pitches and see if we can't clear things up a bit.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/accel_corr-749340.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/accel_corr-749338.gif" alt="" border="0" /&gt;&lt;/a&gt;You can see some of the non-linearness if you look very closely between these two plots.  Some good things happened here.  First, the away game data changed a bit but very slightly.  This is exactly what we would expect to see as his road starts should be much closer to league average than his home starts.  Second, the home game data now shows less acceleration both vertically and horizontally which is good.  Francis' curve at home actually appears to have about the same horizontal acceleration as his curve on the road.  There does appear to be a slight tail pointing towards zero in the road data though.  I wonder if this is because I used only fastballs in my comparison or if maybe Francis is compensating and slightly overthrowing his curve at home?  This is one of the loose ends I am still trying to track down.  Lastly, the huge fastball/change blob in the uncorrected data has become much more distinguished and now it definitely appears to be two blobs with the fastball in the upper right and the change down and toward zero horizontally.  That is exactly what we would expect from a change and the reason this blob is so close to the fastball blob is because Francis has a very good one.  Looking back you can kind of make out the distinction in the previous plot but it is much more defined here which again is a sign that the corrections are working.&lt;br /&gt;&lt;br /&gt;There still is one problem though.  How do we know this correction is moving the data to the right spots?  We know that the Coors data should show less acceleration than the road data but how can we tell if it is overcompensating or under compensating?  The only way I know how to check this is to transform each pitch like it was thrown at sea level at standard temperature.  You can check the &lt;a href="http://en.wikipedia.org/wiki/Air_density"&gt;air density link&lt;/a&gt; again if you want to look that up.  In this frame everything should be equal and all of the accelerations now should match up.  So does it?&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/accel_nor-720459.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/accel_nor-720457.gif" alt="" border="0" /&gt;&lt;/a&gt;Yes it matches up very well.  A careful eye will notice that not only did the Coors data get an increase in acceleration (or decrease since these numbers are mostly negative) but the road data did as well (though much smaller).  The reason for this is standard temperature is about 59 degrees Fahrenheit and most baseball games are played at temperatures above that.  Again, Francis' curve looks slightly different at home and away here.  Maybe that is from the extra tail I mentioned before in the road data but maybe not.  The strange thing is the vertical acceleration seems to be spot on but the horizontal acceleration is slightly off.  I don't really have a good explanation for that right now.  Maybe we should ask Jeff Francis himself who &lt;a href="http://www.physicscentral.com/people/2006/francis.html"&gt;studied physics while at college at the University of British Columbia&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;So where is all of this going?  Well except for a little fine tuning I think I am ready to move to the full data set.  You might remember me saying that I have stopped adding data so I would have a consistent set to work with.  I am almost two weeks behind but I have started grabbing the new data now.  Once that is done I have to run it through my parser to get the data in a usable form, then my correction code to get the new correction values, and lastly my player card generator which I still need to adjust a bit to output more plots than just the break.  Hopefully, I will be able to have at least four plots like &lt;a href="http://www.baseball.bornbybits.com/blog/2007/09/progress-errrr-sort-of.html"&gt;I showed for Jose Contreras&lt;/a&gt; in my previous post.  If there is an extra pitching plot you would like to see added let me know in the comments section below.  I also still need to remove those pathological points I showed in those Contreras plots.  Joe P. Sheehan has suggested that a cut on initial speed might solve the problem.  I had kind of thought that might be right but somehow that got lost in my memory so thanks to Joe for telling/reminding me of that.  If things go smoothly I should have player cards with corrected home and road data by Thursday night.  If things don't go smoothly or if I don't get a chance to work on this then I will have them up by the weekend.  Once that is completed I can move on to other fun topics I wanted to look at with this data.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/09/breakthrough.html' title='Breakthrough'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=2639862500254829827' title='3 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/2639862500254829827'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/2639862500254829827'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-2694894279770859209</id><published>2007-09-13T18:24:00.000-04:00</published><updated>2007-09-14T07:56:49.638-04:00</updated><title type='text'>Progress. Errrr, Sort of.</title><content type='html'>First I want to encourage everyone here to read Mike Fast's recent &lt;a href="http://fastballs.wordpress.com/2007/09/12/mad-dog-mishmash/"&gt;post about Greg Maddux&lt;/a&gt;.  The analysis he has done on Maddux is what I am hoping my clustering algorithm can do on every pitcher.  Things are moving along with the algorithm and I want to share some progress.  The algorithm is still messing up Maddux's two and four seam fastballs but it now correctly identifies his cutter so that it some progress.  Instead of showing you worse plots than what Mike had put together I decided to show a similar type of pitcher to Maddux in Jose Contreras.  Now Contreras isn't having nearly as good of a year as Maddux but both are similar pitchers featuring several types of fastballs and a pretty small variation between pitches.  Here is Contreras' horizontal and vertical movement.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/Jose_Contreras-789305.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/Jose_Contreras-789303.gif" alt="" border="0" /&gt;&lt;/a&gt;What a mess we have here.  Contreras is throwing a two seam fastball and what looks to be a cut fastball but also a change, a slider, and a curve.  All of the pitches seem to blur together in this plot but if we add in the pitch speed they start to separate.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/Jose_Contreras2-789398.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/Jose_Contreras2-789396.gif" alt="" border="0" /&gt;&lt;/a&gt;Here is a breakdown of horizontal break and pitch speed.  I thought about adding vertical break to this plot as well but things were very messy as is.  Here you can see Contreras' change break away from his fastballs and some separation between his sinker and his cutter.  I was pretty impressed that the algorithm would pick up these differences.  Also, even though we have much less statistics, we can see a clear speed difference between his slider and curve ball.  Next the vertical movement.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/Jose_Contreras3-702965.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/Jose_Contreras3-702961.gif" alt="" border="0" /&gt;&lt;/a&gt;Now you can see that his sinker really is sinking more than his cutter and the increased drop in his curve from his slider.  What about his release point though?  Contreras is known as someone who will drop down to 3/4 arm slot from time to time.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/Jose_Contreras4-703055.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/Jose_Contreras4-703052.gif" alt="" border="0" /&gt;&lt;/a&gt;Perfect.  We can see his regular arm slot and the 3/4 arm slot and it appears most of his cutters come from that 3/4 position.  But hang on a second.  What is with those stay points off to the right?  This must be where PITCHf/x just screwed up and miss read the pitch.  As crafty as Contreras is I doubt he actually threw a pitch left handed.  Every time I look up it seems there is something else to the data that needs correcting.  Clearly that unknown point way to the right needs to go and the change that is off by more that a foot also needs to be removed.  What about that cluster of five pitches in the upper right though?  Is that a crafty vet showing a different arm angle for an important pitch or just a mistake from PITCHf/x?  I don't have the answer right now but hopefully will soon.&lt;br /&gt;&lt;br /&gt;This weekend looks very busy for me but hopefully I will have some time to work on this.  The order of what I am planning on doing is fixing this release point issue first.  Then hopefully going back to the clustering algorithm and getting that ready to go.  I feel like that is close.  Seeing what a good job it did with Contreras gives me hope.  The biggest thing right now is probably getting it to merge more of those unknown points into established pitches.  Lastly, tackling the acceleration correction which I will almost certainly not have time for.  I actually had a decent idea for a work around with it but it is going to take a long time to code up and then test.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/09/progress-errrr-sort-of.html' title='Progress. Errrr, Sort of.'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=2694894279770859209' title='0 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/2694894279770859209'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/2694894279770859209'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-6415900213506143107</id><published>2007-09-11T17:48:00.000-04:00</published><updated>2007-09-11T23:34:55.192-04:00</updated><title type='text'>Player Cards</title><content type='html'>After a weekend of banging my head against the wall trying to figure out how to &lt;a href="http://www.baseball.bornbybits.com/blog/2007/09/response-to-reader-question.html"&gt;properly normalize the acceleration&lt;/a&gt; I needed a break.   So back to just looking at home pitches for pitchers. I decided to skip ahead to the next thing I wanted to do which is upload some &lt;a href="http://baseball.bornbybits.com/plots/players.html"&gt;player cards&lt;/a&gt;.  Basically, the plan was to use the PITCHf/x data to create a plot of the type of pitch each pitcher throws and then start to expand from there.  What I needed was a clustering algorithm that could look at all the pitches thrown by a pitcher and classify them.  I am not going to go into details about the algorithm as it still needs some fine tuning (as you will see below) but basically it examines every pitch and correlates speed and movement into clusters.  Once it has those clusters for each pitcher it finds the pitcher's fastball and then calculates what his other pitches do in comparison to the fastball.  It then compares his other offerings to all other pitchers and tries to guess what the other pitches are.   Sometimes this algorithm preforms well.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/Jonathan_Broxton-701860.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/Jonathan_Broxton-701857.gif" alt="" border="0" /&gt;&lt;/a&gt;First, these plots show the movement of the pitch not the location.  For a great description of what exactly this means read this excellent &lt;a href="http://www.hardballtimes.com/main/article/in-search-of-the-sinker/"&gt;article&lt;/a&gt; by John Walsh.  This is exactly what you would expect from the hard throwing Broxton.  He has a great four seam fastball and what can be a devastating slider.  His change though, is a work in progress.  It doesn't have nearly the same movement as his fastball which helps tip the pitch to opposing batters. Because of this, you can see he doesn't throw it very often.&lt;br /&gt;&lt;br /&gt;Sometimes though the algorithm can get messed up.  This mostly happens in two ways.  First, the clustering gets over active and combines two pitches that really are different.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/Takashi_Saito-703616.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/Takashi_Saito-703614.gif" alt="" border="0" /&gt;&lt;/a&gt;Saito appears to be throwing two varieties of fastballs (two seamer? cutter?) but the clustering algorithm combines them into one type.  This mostly occurs when the speed of the two pitches is very close.  You can see that Saito's splitter and his curve are about as far apart as the two fastballs but the algorithm correctly separated them. The other failure is sometimes the algorithm will misidentify a pitch.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/Roy_Oswalt-731102.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/Roy_Oswalt-731099.gif" alt="" border="0" /&gt;&lt;/a&gt;It is my understanding that Oswalt throws a slider not a split finger fastball but the pitch seems to move more like Saito's split flinger fastball than Broxton's slider to the algorithm.  Also, one pitch that Oswalt threw didn't seem to match up to anything and just got left out.  Looking from the movement on the pitch it probably is a fastball but it could be a change.  Missing one pitch from Oswalt really isn't a problem but a few pitchers have clusters of pitches that aren't combined.   Rich Hill is an example of this.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/Rich_Hill-708245.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/Rich_Hill-708242.gif" alt="" border="0" /&gt;&lt;/a&gt;Again, Rich Hill throws a slider not a splitter but the horizontal movement gets the pitch classified as a splitter.  If the group of unidentified pitches were added in maybe the pitch would be correctly identified.  Sometimes all hell breaks loose and the algorithm falls apart.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/Greg_Maddux-731293.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/Greg_Maddux-731291.gif" alt="" border="0" /&gt;&lt;/a&gt;The great Greg Maddux who throws nothing but fastballs.  So what is going on here?  Well the clustering algorithm really needs some space between the types of pitches and Maddux doesn't really provide any.  What I mean by that is Maddux will throw his fastball at a wide range of speeds.  The low end of that range is very close to the high end of the velocity on his change.  This provides a bridge for the clustering algorithm to lump them all together.  The unknown points in the bottom right are some type of off speed pitch but it is unclear what.  Lastly, we can look at the worst case scenario, the knuckleball.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/Tim_Wakefield-791443.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/Tim_Wakefield-791441.gif" alt="" border="0" /&gt;&lt;/a&gt;Here the algorithm really doesn't have a chance.  It does a good job of separating the knuckleball from the fastballs and most of the knuckleballs are grouped together with a few wrongly grouped at the edges.  The problem comes in comparing Wakefield to other pitchers.  Without any other knuckleballers in the league to compare him to the algorithm is lost and just throws out a guess and calls the pitch a slider.&lt;br /&gt;&lt;br /&gt;Anyway, here is where you come in.  I have uploaded a plot for every pitcher who has thrown more than 100 pitches in their home park while PITCHf/x was on.  If your favorite pitcher is missing don't worry, hopefully I will soon have a good league correction and can add in the away stats.  What I need is you to look over plots and tell me where the algorithm has messed up.  If the algorithm has combined two lumps of pitches that you think should be separated let me know in the comments below.  If the algorithm has incorrectly identified a group let me know.  If there is something ascetically unpleasing about the graphs or if there is something you would like to see me add to them let me know.  If you would rather email me than add a comment my email can be found under my profile to the right.&lt;br /&gt;&lt;br /&gt;The whole process of going from downloading the data to producing the plots takes nearly half a day.  The clustering algorithm itself takes over three hours on my super fast desktop.  The moral of the story is I am going to stick with this data set for at least a few more days as I try to hammer out the kinks to the algorithm.  Maybe this weekend I will do a full update.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/09/player-cards.html' title='Player Cards'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=6415900213506143107' title='7 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/6415900213506143107'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/6415900213506143107'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-7709068254743936274</id><published>2007-09-06T11:47:00.000-04:00</published><updated>2007-09-06T19:42:13.410-04:00</updated><title type='text'>Response to a Reader Question</title><content type='html'>A &lt;a href="https://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=7024382379064404258"&gt;couple of posts back&lt;/a&gt; reader Alan brought up some interesting ideas for checking the data.  Here is part of his comment.&lt;br /&gt;&lt;br /&gt;&lt;blockquote&gt;Consider only fastballs, which we can take to be pitches&gt;90 mph. First thing to look at is the initial z-component of the velocity. A negative z velocity means the pitch is thrown slightly downward. Do you see a correlation between the release point and the initial z velocity? Does the pitcher compensate for the higher release point with a larger downward component of velocity?&lt;/blockquote&gt;&lt;br /&gt;&lt;br /&gt;I want to examine these correlations and add in a few more variables to help complete the picture.  At the time Alan had wanted me to use the two parks that had the largest separation in vertical release point, z0, which were Fenway and AT&amp;amp;T.  Since then I found a &lt;a href="http://www.baseball.bornbybits.com/blog/2007/09/preliminary-correction-to-pitchfx-data_05.html"&gt;bug in my code&lt;/a&gt; and now the two parks that are furthest away are Fenway and the Metrodome.  The problem is both of those parks are on the lower end &lt;a href="http://www.baseball.bornbybits.com/blog/2007/09/somewhat-pretty-pictures.html"&gt;number of pitches tracked&lt;/a&gt; by PITCHf/x.  So instead I am going to start by looking at Petco park in San Diego and AT&amp;T park in San Fransisco.  AT&amp;amp;T doesn't have a whole lot more statistics than the Metrodome but it has a lower variance in my correction factor and the Giants and the Padres play each other very regularly so hopefully the overlap of pitchers in the data will be larger.&lt;br /&gt;&lt;br /&gt;I also should note that I am a little bit concerned about using the definition of all pitches with an initial speed of 90 MPH are fastballs.  While I am not too concerned about actual fastballs that are below 90 MPH being missed with this definition, I am concerned that some breaking balls will enter the sample.  Not too many pitchers throw a 90 MPH breaking ball but my initial correction factor for the error on the pitch speed is about 5 MPH and there are plenty of pitchers who can throw an 85 MPH breaking ball.  Nevertheless, I haven't come up with a better definition at this time and this definition will work for our purposes today.&lt;br /&gt;&lt;br /&gt;To start with I am going to check the correlation between the initial vertical release point and the vertical height when the ball crosses home plate.  The reason I want to check this first is something else that Alan said in his post that the calibration should be better near home plate.  That got me remembering this tidbit from Joe P. Sheehan when he was writing about differences in the parks &lt;a href="http://baseballanalysts.com/archives//001538-print.html"&gt;here&lt;/a&gt;:&lt;br /&gt;&lt;blockquote&gt;Almost all of the pitchers also get a smaller pfx_z [movement of the pitch vertically] value at home, which would seem to indicate that their pitches have more sink at Fenway, but is actually a result of the lower release height combined with the fact that, overall, the average height when a pitch crosses the plate at Fenway is similar to the height at other parks.&lt;/blockquote&gt;So he was seeing a very large variation in the release point but a small variation when the pitched crossed home plate.  This doesn't seem to make sense and I want to look at this first.  So finally, here is a plot comparing the initial and final height of the ball at Petco and ATT&amp;T parks.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/height-752637.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/height-752635.gif" alt="" border="0" /&gt;&lt;/a&gt;Pitchers tend to release the ball about 6 feet above ground level though obviously this will vary from pitcher to pitcher.  We can see in this data though that the Petco data tends to be below 6 feet and AT&amp;T data tends to be above 6 feet.  Also, we can see a bunch of points near 3 feet in the San Diego data.  This is from side armer Cla Meredith for the Padres.  You would expect to see a few points from him show up on the San Fransisco data but that appears missing.  So I went back and checked and Meredith has yet to pitch at AT&amp;amp;T park while PITCHf/x was activated.  There appears to be another grouping of pitches just above 4 at Petco.  This almost certainly is another Padre pitcher but I haven't yet tracked him down.  If there are any Padre fans who know who this is please let me know.&lt;br /&gt;&lt;br /&gt;Anyway, besides the disparity in initial height, the height as the ball cross home plate appears very consistent across both parks.  If the initial position is off by as much as we think then why is the final position so stable?  It must be as Alan suggested that the PITCHf/x system is more stable near home plate.  I have a theory as to why this is but I am going to save that for my next post when I go in depth as to what I think is actually happening with the data.  What this is showing is the initial and final heights of the baseball aren't correlated at all.  This means we should be free to correct the initial position without worrying about changing the final position (as funny as that sounds).  Here then is the same plot with the &lt;a href="http://www.baseball.bornbybits.com/blog/2007/09/preliminary-correction-to-pitchfx-data_05.html"&gt;vertical correction&lt;/a&gt; applied.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/height_corr-752737.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/height_corr-752734.gif" alt="" border="0" /&gt;&lt;/a&gt;What an improvement that makes.  Again because this correction is based on a pitcher by pitcher comparison of each park, this shift isn't moving the center of the Petco data on to the center of the AT&amp;T data.  Because the Padres have a few pitchers who throw at a very low height that difference still remains in the data.  The "average pitcher" who releases his ball just above 6 feet though will come together and that is exactly what the corrected plot shows.  Now we are ready to look at the initial height and the initial vertical velocity to see if we see a correlation there.  Because we aren't seeing a correlation between the two heights something must be causing that and it pretty much has to be either the initial velocity or the acceleration or both.  Starting again with the uncorrected data.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/z0_vz0-767684.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/z0_vz0-767682.gif" alt="" border="0" /&gt;&lt;/a&gt;Here we can see clear correlation and it is exactly what we would expect.  As the pitch is being released the higher it is being released from the more negative (or downward) its velocity.  This makes perfect sense the only problem is the data looks terrible.  Again we see a difference in the initial height but there appears to be more here.  Lets start out by correcting for the initial height and see what that gives us.&lt;br /&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/z0_vz0_corr-768721.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/z0_vz0_corr-768716.gif" alt="" border="0" /&gt;&lt;/a&gt;Now the heights seem to match up well (except again for the two blobs now at 3 feet and near 5 feet) but the velocities seem off.  The AT&amp;T data appears to have more downward initial velocity than the Petco data.  So I am going to apply a correction to the initial velocities that I calculated the same way I calculated the initial height correction.  As I pointed out in previous posts the errors that I am seeing on these corrections are huge.  For instance, Petco checks in as being high by .5 FT/s with an error of 146 FT/s (AT&amp;amp;T is nearly 1 FT/s low).  Obviously that doesn't seem to make any sense and either something is still wrong with the my code or we just need more data or I need to correctly identify the fastballs or I need to carry the calculation out further.  Because of this I am not yet going to publish these corrections.  I don't really trust these numbers and I don't want people using them until I feel confidant that they are correct.  Once I get them fixed though I will be putting the numbers out for people to use.  Just for fun lets put in the numbers and see what happens.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/z0_vz0_full-729276.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/z0_vz0_full-729274.gif" alt="" border="0" /&gt;&lt;/a&gt;Wow that looks pretty good.  I just don't understand why I am seeing such a huge error when I look at plots like this showing things matching up well. There is another interesting thing can be seen in this plot.  Remember back when I said I was concerned about making a hard cut at 90 MPH for the pitch speed?  The reason was that cut wouldn't be uniform over the parks.  Here, AT&amp;T was increasing the initial pitch speed by having a more negative initial vertical velocity.  Petco was doing exactly the opposite.  That means we are actually seeing some 87ish MPH pitches in the Petco data and we are only seeing 93ish MPH pitches in AT&amp;amp;T.  I believe that is why the AT&amp;T data fits snugly inside the Petco data.  The slower the ball is moving presumably the more potential for break (acceleration) there is and the wider the variation in position and velocity.&lt;br /&gt;&lt;br /&gt;That was interesting but while Petco and AT&amp;amp;T were at the extremes for variation in intial height they were closer to middle of the pack for variation in initial downward velocity.  What if we look at two parks that are very extreme in both categories?  The two best (worst?) parks here are Fenway and Angel stadium.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/h_v_II-729367.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/h_v_II-729365.gif" alt="" border="0" /&gt;&lt;/a&gt;Wow that plot looks ugly.  Hopefully after our corrections things will get better.  Again we will start by correcting just the initial height.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/h_v_II_corr-766895.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/h_v_II_corr-766891.gif" alt="" border="0" /&gt;&lt;/a&gt;Not quite the nice fit we saw before (in the initial height match).  Part of this could be due to the Boston staff being shorter than usual but part of it might be be due to error on these numbers.  Fenway is checking in at an error of nearly .2 feet and while that might not seem like a lot, if you moved the purple points right .2 feet it sure would look better to me.  Now on to the initial velocity adjustment.  These two parks are over 4 ft/s (over 2 MPH) different in just their initial downward velocity according to my numbers.  Again, the errors on these numbers are huge but lets put them in and see what we get.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/h_v_II_full-766986.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/h_v_II_full-766984.gif" alt="" border="0" /&gt;&lt;/a&gt;While not as perfect as the AT&amp;T/Petco match this is a huge improvement for two parks that were radically different to start with.  This basically is the worst case scenario for having to correct the data and the results seem very good to me.  If this was all the closer we could get with these corrections I would still  be happy.&lt;br /&gt;&lt;br /&gt;Ok so I have shared the good news with you.  Looking at these plots it really seems like not only can we understand what is going on with the data but we can fix it as well.  Now the fly in the ointment.  The other parameters that are vital to these calculations are the accelerations (in x, y, and z).  For this data Sportvision is assuming that the acceleration is constant over time, meaning the change in velocity when the pitch is thrown is the same as the change in velocity as the pitch goes over home plate.  Now, obviously this isn't a perfect assumption as the ball could be slowing down more the closer it gets to home plate.  The problem is if you allow for a changing acceleration then the nice equations of motion that they use fall apart and things become even more messy.  In reality, it probably isn't bad at all to make the assumption that the acceleration isn't changing (Though I can't say for sure.  If you are looking for a topic to tackle using this data this would be an idea.) but the problem for us is the method that we are using for creating corrections for the initial distances and velocities won't work.  This means if we find that the accelerations need fixing, along with the positions and the velocities, then we are going to have to come up with a different method then the one I have outlined for fixing them.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Close your eyes (or turn off your monitor) if you don't want to see the bad news.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/z0_az-783560.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/z0_az-783558.gif" alt="" border="0" /&gt;&lt;/a&gt;Going back to Petco and AT&amp;T here is the vertical acceleration compared to the initial height.  Again we can see the problems in the initial height because this is uncorrected data, but the accelerations don't seem to be matching up well either.  Correcting for the initial height we can fully see the problem.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/z0_az_corr-783659.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/z0_az_corr-783656.gif" alt="" border="0" /&gt;&lt;/a&gt;Ick.  Again we can see Meredith and his fastballs that appear to be breaking down very hard (sinkers).  Also, pitcher X's data has come out from hiding a bit and we can see his contribution near 5 feet in initial height and -40 st/s^2.  His fastball must be a sinker as well.  The bad news though is it appears that an acceleration correction is going to have to be made for this data to match up.  It is close, but just not close enough.  This really sucks because what appears to be happening is the acceleration is being spread out in Petco and this correction won't be a nice linear one like the position and velocity corrections have been.  Just for more proof here is the Fenway/Angel stadium plot uncorrected first.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/h_a_II-792018.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/h_a_II-792013.gif" alt="" border="0" /&gt;&lt;/a&gt;Again, these two parks are just about as bad as the data is going to get unless one of the last two parks to come online really sucks.  Correcting for the initial height things get better but still look pretty poor.&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://www.baseball.bornbybits.com/blog/uploaded_images/h_a_II_corr-792120.gif"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://www.baseball.bornbybits.com/blog/uploaded_images/h_a_II_corr-792117.gif" alt="" border="0" /&gt;&lt;/a&gt;Again we are seeing a spreading out of the accelerations.  Instead of being able to match these two distributions by moving one or the other left/right or up/down the distributions will have to be shrunk or spread out.  It is possible that my artificial cut at 90 MPH is doing some of this (like we saw in the position/velocity graphs) but I don't think it is responsible for all of it.&lt;br /&gt;&lt;br /&gt;So where do we stand?  Even without a great way of teasing the fastballs out of the data it appears that we will eventually be able to get some good correction factors for the initial positions and velocities.  The accelerations are another story and something that will have to be thought about.  If anyone has a good way of cutting the data to produce fastballs and are interested in sharing it please let me know.  Also, if anyone knows thinks they have a good method for correcting the accelerations even if they don't know exactly how to implement it let me know.&lt;br /&gt;&lt;br /&gt;You may have noticed that I started calling it PITCHf/x instead of pitchFX like my previous posts.  I had seen it written both ways a lot and thought pitchFX was correct but after reading through the sport vision website again it definitely should be PITCHf/x.  My apologies to the creators.&lt;br /&gt;&lt;br /&gt;ps. If reader Alan happens to be Dr. Alan Nathan who published this excellent &lt;a href="http://webusers.npl.uiuc.edu/%7Ea-nathan/pob/Analysis.pdf"&gt;paper&lt;/a&gt; examining John Lester's start against the Mariners please email me.  You can find my email address under my profile on the right.  I'd really like to chat about possibly using the spin magnitude and axis to classify pitches and why his theoretical fit to the data matched up so well when I am seeing such terrible agreement.  Actually, anyone who wants to discuss any of that or anything else can email me with the link provided under my profile.</content><link rel='alternate' type='text/html' href='http://www.baseball.bornbybits.com/blog/2007/09/response-to-reader-question.html' title='Response to a Reader Question'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=6337312839698763116&amp;postID=7709068254743936274' title='6 Comments'/><link rel='replies' type='application/atom+xml' href='http://www.baseball.bornbybits.com/blog/atom.xml' title='Post Comments'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/7709068254743936274'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/6337312839698763116/posts/default/7709068254743936274'/><author><name>Josh Kalk</name><uri>http://www.blogger.com/profile/03137640990432404781</uri><email>noreply@blogger.com</email></author></entry><entry><id>tag:blogger.com,1999:blog-6337312839698763116.post-6112263917359102542</id><published>2007-09-05T17:37:00.000-04:00</published><updated>2007-09-05T18:33:35.774-04:00</updated><title type='text'>Somewhat Pretty Pictures</title><content type='html'>Ok so the last couple of posts probably have been pretty boring to most people.  So I am going to interject with some plots from p