(Original title: Some Thoughts . . . Later title: Some Bold, Original Thoughts. . . But heck, if you want attention, you've got to ask for it!)
(For newbies -- $H is the percentage of balls in play that are turned into outs. This varies surprisingly widely from year to year for any given pitcher, leading some to question whether the ability to get guys to hit easy grounders and fly balls is a real ability or not. Voros McCracken has explored this notion in detail as Defense Independent Pitching Statistics (at least I think that's what the acronym stands for <g>)).
1) Much of the reason why $H varies so much from year to year, where K% and BB% don't, is *because the sample size is smaller*. Too small. Note that HR%, which has a very similar sample size, also has a large variance from year to year. The additional lack of correlation in $H can be attributed to larger team defense and park effects.
It's trivial to simulate multiple seasons of $H in Microsoft Excel for a pitcher with any theorized innate level of $H. The results are eye-opening.
Put this formula in cell A1:
=IF(RAND()>0.271,0,1)
where the 0.271 is whatever $H rate you want to simulate (that happens to be an estimate for Maddux).
Now copy the formula down to the next 650 cells (650 being a typical number of balls in play for a pitcher with 150+ IP).
In cell B1, put:
=SUM(A1:A650)/650
Now hit F9 repeatedly to regenerate the random numbers. Cell B1 will report each season's $H number. You will see quite a bit of variation. I'm doing this in real time:
Obviously this CyberGreg learned to pitch after those first three mediocre seasons! And he must have come up with some adjustment after the .314 to go back to .257, which started another string of vintage years.
(In case you miss the point, the previous paragraph is total nonsesnse, although total nonsesnse of the sort our brain is hard-wired to come up with.)
(Lowering the sample size just a bit will increase the variance noticeably.)
He was "better" than CyberGreg in years 1, 2, 3, 5, and 10.
Any trained scientist would dismiss the notion that there were any difference between these two pitchers. Over 15 seasons, CybeGreg averaged .277 +/- .018, CyberTom .286 +/- .017. That has a 16% chance of happening at random. And yet we *know* the pitchers *are* different!
(OTOH, the scientist would also dismiss the *obvious* fact -- fact! -- look, it's obvious in the numbers! -- that CyberTom was the better pitcher for the first 5 years, until CyberGreg developed his mastery. And the scientist would be correct: CyberGreg was just *unlucky* for 5 years.)
In reality, of course, the different levels of team defense behind the pitcher cause an even greater fluctuation from year-to-year. Two years ago the real Greg and Tom had their $H's go up hugely, to .331 and .318. If you include '99 and ask, what are the odds that this string of $H numbers could happen by chance, you basically get No for an answer (2 and 4% chance, respectively). Removing '99 returns the odds on Maddux to 72%, and Glavine to 20%. IOW, with the exception of one season, you don't have to invoke *anything* beyond random variation to explain the fluctuation in their $H numbers in their career together, and in Maddux's case the variation is particularly trivial.
So: because of small sample sizes, differences in pitcher $H are *very* hard to detect statistically. But that doesn't mean they're non-existent or even trivial in size. The apparent difference between Maddux and Glavine over their joint career in Atlanta, in terms of ERA, is about 0.20. That's not trivial. Give me that every day and I'll finish 3 games ahead of you in the strandings. My rough guess / gut feeling is that there are pitchers who shave up to 0.25 off their ERA and others who lose about 0.25, because of their innate $H ability. Maybe even more.
2) I believe that my study of pitchers who changed teams may be the first to show conclusively that there *is* an effect. I originally compared pitcher's new $H to his old $H, and his old and new team's $H with him included in the team totals. That was messy. Subtracting him out from the team totals (this spreadsheet is 68 columns wide and growing!) gives us this formula for a traded player's $H, once everything has been converted to z-scores (minimum 450 balls in play, which gives us 443 pairs of seasons in the study):
New $H = (.201 * old $H) + (.290 * new team's $H) - .076
The significance of the old $H factor is immense -- 1 chance in 684,050 of happening by chance!
Combined, the two factors account for just 22.4% of his new $H.
So, this says his new $H is based 13% on his new team $H, and 9% on his old $H.
What determines his new team's $H? It's the average of the innate $H of all the other pitchers -- which will *tend* to be average -- plus the effect of the defense per se, plus the manager, plus the park effect. Now, the quality of the other pitchers on the team doesn't actually affect him, while the other factors do. The fact that there must be good and bad staffs for $H will thus add noise to the relationship. This additional noise will thus weaken the observable relationship of new $H to new team $H. Thus, the true factor for team $H contribution is larger than 13% -- it's something that reduces to 13% when you add the noise effect. It may be possible later to estimate the size of this effect. I'm guessing 15 or 16%.
The last term above, the apparent -0.76 fudge factor (that's .076 of a standard deviation, about .0017 in raw $H terms) is very interesting. Pitchers who change teams *unquestionably* have a reduction in $H, that is independent of everything else, from an average of .280 to .276. It will be very interesting to see if this varies by era. I have only one weak idea why this happens: change-of-scene psychology? I'm not convinced.
3) So how should DIPS work? We shouldn't throw out $H entirely. We can do two things: we can compare $H to team $H without that pitcher, and we can look at lifetime $H. Combining both approaches, we can look at lifetime $H relative to team. This can then be modified by park and manager effects (when known) to get a true $H estimate. This might be very valuable for predicting future pitcher performance.
Until we do this, we won't really know how large the range of innate $H skill is, and we won't know how much trouble to go to in order to figure it out!
(And I can think of a heaping mess of trouble. Ask me about it if you dare.)
4) None of this abrogates the importance of Voros' work with DIPS. The realization that there are very large utterly random fluctuations in ERA, and the understanding of where exactly that comes from, so that we can say whether a change in ERA was luck or skill -- that remains revelatory.
5) Pedro Martinez went from a true $H of .325 in 1999 to one of .237 in 2000. His HR rate, however, nearly doubled. The odds of this all happening by chance are about 1 in 286. Conclusion: if anyone could refine their pitching approach in a way that reduced how often batters hit the ball hard, at the expense of some extra homers, it's Pedro. I think his true $H actually did go down last year. -- ---- Eric M. Van em...@post.harvard.edu
". . . from that day forward she lived happily ever after. Except for the dying at the end. And the heartbreak in between." - Lucius Shepard.
Eric M.Van <em...@post.harvard.edu> wrote: > 3) So how should DIPS work? We shouldn't throw out $H entirely.
Yes we should, actually. It stands for defense independent pitching stats. If we include hits in play, it's no longer defense independent, no matter how we choose to estimate the defenses contribution. It goes from DIPS to PS.
The point I'm making is that there's a price to be paid by multiplying things by a series of weights and balances. The numbers cease to mean anything other than a best guess as to how good the pitcher was.
IOW, even if pitchers had a meaningful ability to affect hits per balls in play (which they don't), DIPS would be a solid concept (in fact I started putting them together before I knew the first part), if only for use in evaluating teams.
Eric M.Van <em...@post.harvard.edu> wrote in article <3A524E87.E7267...@post.harvard.edu>...
> So: because of small sample sizes, differences in pitcher $H are *very* hard to > detect statistically. But that doesn't mean they're non-existent or even > trivial in size.
Do you concur, Voros? You said that any ability was below the noise floor, IIRC. He says this, then says the effect isn't trivial.
> The apparent difference between Maddux and Glavine over their > joint career in Atlanta, in terms of ERA, is about 0.20. That's not trivial.
So their BB, K, and HR rates are equivalent, and only the $H makes up that difference?
> Eric M.Van <em...@post.harvard.edu> wrote in article <3A524E87.E7267...@post.harvard.edu>...
> > So: because of small sample sizes, differences in pitcher $H are *very* hard to > > detect statistically. But that doesn't mean they're non-existent or even > > trivial in size.
> Do you concur, Voros? You said that any ability was below > the noise floor, IIRC. He says this, then says the effect > isn't trivial.
> > The apparent difference between Maddux and Glavine over their > > joint career in Atlanta, in terms of ERA, is about 0.20. That's not trivial.
> So their BB, K, and HR rates are equivalent, and only the $H > makes up that difference?
> -
No, I'm estimating that, of the difference in ERA (2.43 vs. 3.25) since they've been teammates, about 0.24 (revised estimation) comes from $H. That's a good 30% of the difference.
That Voros claimed, in trying to explain correlation, that lack of correlation implies small variance, is the key to misunderstanding the error of his claim. There is no signal, no matter how humongous, that can't be rendered difficult or impossible to detect by adding enough noise (often in the form of the noise inherent in small sample sizes). *That doesn't make the signal smaller.*
$H has so much noise added to it that it's very hard to nail down. That absolutely, positively does not mean that differences between $H talents must be tiny.
Here it is in a nutshell:
Maddux with Atlanta (excluding '79) is .271, Glavine is .282. Draw, for each one, a bell curve whose peak is centered at that number. The bell curve is a probability graph for their *true* number.
Stats like K% and BB& have very narrow bell curves. Thus, we can say for certain that Pedro's K ability is truly better than Toma Ohka's. $H has a fairly broad bell curve, so broad that they overlap quite a bit for Maddux and Glavine, or for almost any two pitchers. That makes it impossible to say *for certain* that one is better than the other at $H.
HOWEVER, broadening the bell curves DOES NOT change the position of the peaks, the *most likely* number. For all our uncertainty, the *most likely* case for Maddux remains .271, and for Glavine, .282.
It's just a fact of the universe that there can be a PROFOUND difference between A and B, and yet we can be unsure of that difference because of insufficient sample size. Our being unsure does not mean we should not try to make a best estimate of the difference, howeever.
> Eric M.Van <em...@post.harvard.edu> wrote: > > 3) So how should DIPS work? We shouldn't throw out $H entirely.
> Yes we should, actually. It stands for defense independent pitching > stats. If we include hits in play, it's no longer defense independent, no > matter how we choose to estimate the defenses contribution. It goes from > DIPS to PS.
Of course you're right. What I meant to say, was, what's the role of DIPS? It's a starting place. But you don't get the complete picture until you look at $H, and look at it as meaningful.
> The point I'm making is that there's a price to be paid by multiplying > things by a series of weights and balances. The numbers cease to mean > anything other than a best guess as to how good the pitcher was.
Right. DIPS separates the part we're fairly certain about from a part that we have difficulty measuring. But what the hell is wrong about a best guess of something we can't know for certain? Especially if it can add or subtract 0.25 or 0.40 or whatever to an ERA?
> IOW, even if pitchers had a meaningful ability to affect hits per balls > in play (which they don't),
Ahh, Voros . . I just proved that they do, to a rigor 34,202 times as great as I would need to get it published in a scientific journal. When a pitcher changes teams, his new $H depends on his old $H. Period. End of story. No possibility of other explanation. It's as cut and dried as possible.
It has *never* followed that because $H correlates poorly, differences between it must be small.
Think about this: looking at 443 pitchers who changed teams, I was able to determine, with 95% certainty, that their old $H accounts for between 5.7 and 12.4 % of their new $H (with 9.2% likeliest). Could I have found such a small effect if the differences between pitchers were very small? Unlikely.
DIPS would be a solid concept (in fact I
> started putting them together before I knew the first part), if only for > use in evaluating teams.
DIPS is a great concept. The notion of making a separation between that which we're sure about and that which is vague is wonderful. But vagueness does not and never has and never will equal smallness. We've only got a rough idea of the size of the Andromeda Galaxy.
> HOWEVER, broadening the bell curves DOES NOT change the position of the peaks, > the *most likely* number. For all our uncertainty, the *most likely* case for > Maddux remains .271, and for Glavine, .282.
Ugh... Actually the probability of Maddux achieving a $H of .271 is 0, or very close to 0 if you take into account your discretization of the statistic to 3 decimal places. I suggest you find a good statistics book and read carefully the sections on hypothesis testing and confidence intervals. Your observed samples come from a distribution that is probably peaked, and MAY have come from somewhere near the peak, but just as likely came from somewhere outside of the peak. IOW, there's a 50% chance that your sample mean came from outside the middle 50% of the distribution of that mean.
> It's just a fact of the universe that there can be a PROFOUND difference between > A and B, and yet we can be unsure of that difference because of insufficient > sample size. Our being unsure does not mean we should not try to make a best > estimate of the difference, howeever.
Very true, but one must be honest with oneself concerning the accuracy of your estimate.
Dale Hicks <dgh1...@bellspamlesssouth.net> wrote: > Eric M.Van <em...@post.harvard.edu> wrote in article <3A524E87.E7267...@post.harvard.edu>...
>> So: because of small sample sizes, differences in pitcher $H are *very* hard to >> detect statistically. But that doesn't mean they're non-existent or even >> trivial in size. > Do you concur, Voros? You said that any ability was below > the noise floor, IIRC.
Basically. Essentially I said that any observed differences in the stat between two pitchers have thus far been difficult to ascribe to ability.
> He says this, then says the effect isn't trivial.
I suppose it depends on your definition of "trivial". It's kind of like building a compass from cork, needle, water and magnet and then once built arguing whether it's dead on north or 0.4 degrees off.
>> The apparent difference between Maddux and Glavine over their >> joint career in Atlanta, in terms of ERA, is about 0.20. That's not trivial. > So their BB, K, and HR rates are equivalent, and only the $H > makes up that difference?
No. I think he was talking about $H affect on it, which is a tough argument. Since joining the Braves Maddux' rate is .275 and Glavine's is .282. If you count that whole difference as completely attributable to each's ability (tough since over the whole career of each the numbers are .278 for Maddux and .279 for Glavine), you get a little over 5 hits a year. To make a .20 difference, you'd need a little over 5 runs.
If the difference between the two in $H giving full credit for $H doesn't account for 0.20 in ERA, I don't see where you can make that statement.
> > HOWEVER, broadening the bell curves DOES NOT change the position of the peaks, > > the *most likely* number. For all our uncertainty, the *most likely* case for > > Maddux remains .271, and for Glavine, .282.
> Ugh... Actually the probability of Maddux achieving a $H of .271 is 0, or > very close to 0 if you take into account your discretization of the > statistic to 3 decimal places.
Well, yes, from a philosophical point of view you could put a zero on it. But (assuming my estimation of .271 is correct) the odds of it being .271 are slightly higher than it being .270 or .272. And so on.
I suggest you find a good statistics book
> and read carefully the sections on hypothesis testing and confidence > intervals. Your observed samples come from a distribution that is probably > peaked, and MAY have come from somewhere near the peak, but just as likely > came from somewhere outside of the peak. IOW, there's a 50% chance that > your sample mean came from outside the middle 50% of the distribution of > that mean.
That does not change the fact that the sample mean is the *likeliest* mean for the population. Which is all I asserted.
> > It's just a fact of the universe that there can be a PROFOUND difference between > > A and B, and yet we can be unsure of that difference because of insufficient > > sample size. Our being unsure does not mean we should not try to make a best > > estimate of the difference, howeever.
> Very true, but one must be honest with oneself concerning the accuracy of > your estimate.
I've said all along that it's quite prone to error.
Let's say you're shopping for a car, and you've narrowed it down to two cars which are *exactly* as desirable except for breakdown & repair rates. That's gonna break the tie. The Saab has a breakdown rate of 5% +/- 4% and the Volvo is 10% +/- 6%. We're not sure *at all* that the Saab is more reliable than the Volvo. Not at all. But which car do you buy? All things being equal?
(Actually, the answer is, if you had a friend who owned a Saab and it was a lemon, you buy the Volvo even if you read that the rates are 5% +/- 1% and 10 % +/- 1%. But that's another issue entirely.)
Eric M.Van <em...@post.harvard.edu> wrote: > Voros wrote:
>> Eric M.Van <em...@post.harvard.edu> wrote: >> > 3) So how should DIPS work? We shouldn't throw out $H entirely.
>> Yes we should, actually. It stands for defense independent pitching >> stats. If we include hits in play, it's no longer defense independent, no >> matter how we choose to estimate the defenses contribution. It goes from >> DIPS to PS. > Of course you're right. What I meant to say, was, what's the role of DIPS? > It's a starting place. But you don't get the complete picture until you look at > $H, and look at it as meaningful.
>> The point I'm making is that there's a price to be paid by multiplying >> things by a series of weights and balances. The numbers cease to mean >> anything other than a best guess as to how good the pitcher was. > Right. DIPS separates the part we're fairly certain about from a part that we > have difficulty measuring. But what the hell is wrong about a best guess of > something we can't know for certain? Especially if it can add or subtract 0.25 > or 0.40 or whatever to an ERA?
For starters, they don't mean anything. What would any ERA derived from this mean?
Also I'm very much disputing that the difference for ERA is that high, and if it does it's in extremly rare cases.
Finally I think the error of such estimates is much larger than the predicted differences.
>> IOW, even if pitchers had a meaningful ability to affect hits per balls >> in play (which they don't), > Ahh, Voros . . I just proved that they do, to a rigor 34,202 times as great as > I would need to get it published in a scientific journal.
You really haven't. You've shown only and exactly that there's a small correlation that is less likely to be explained by park and defensive biases than a study including all pitchers. you haven't ruled out other explanations, and when the correlation in seasons is around .10 I think maybe you should. Hell as Russell pointed out, I get a correlation of .07 with 1000 samples between the ASCII code of the first letter of the city the pitcher pitched in and his $H rate that year. That's certainly not likely to happen from chance, but it sure as hell is likely to have happened from something that is effectively chance.
The fact is you have a correlation where the the difference between the highest predicted value and the lowest one is a good bit smaller than the standard error of all of those predicitions.
When a pitcher changes
> teams, his new $H depends on his old $H. Period. End of story. No possibility > of other explanation.
For starters, this is _never_ true. There's always a possibility of other explanations.
> It's as cut and dried as possible.
Cut and dried as possible would be a correlation of say .98, not one of .10.
In my $H figures for pitchers who changed teams, the correlation between the $H rate the second year and the year of the first year was .142.
So for the 1000+ pitchers who changed teams, the year they pitched in was a better indicator of $H the next year than $H was.
> It has *never* followed that because $H correlates poorly, differences between > it must be small.
It's a data point. The fact that the difference in active career $H rates among active pitchers resemble the same range as you'd expect from chance is another. The fact that a study of similar pitchers (done three separate times with different groups) for whom all the stats are similar excpet hits produced a result in which the high $H group gave up the same number of hits the next year as the low $H group is another. The fact that forecasts from $H totals yield higher errors than the predicted difference between the worst and best prediction is another.
Eric M.Van <em...@post.harvard.edu> wrote: > Dale Hicks wrote:
>> Eric M.Van <em...@post.harvard.edu> wrote in article <3A524E87.E7267...@post.harvard.edu>...
>> > So: because of small sample sizes, differences in pitcher $H are *very* hard to >> > detect statistically. But that doesn't mean they're non-existent or even >> > trivial in size.
>> Do you concur, Voros? You said that any ability was below >> the noise floor, IIRC. He says this, then says the effect >> isn't trivial.
>> > The apparent difference between Maddux and Glavine over their >> > joint career in Atlanta, in terms of ERA, is about 0.20. That's not trivial.
>> So their BB, K, and HR rates are equivalent, and only the $H >> makes up that difference?
>> - > No, I'm estimating that, of the difference in ERA (2.43 vs. 3.25) since they've > been teammates, about 0.24 (revised estimation) comes from $H. That's a good > 30% of the difference. > That Voros claimed, in trying to explain correlation, that lack of correlation > implies small variance, is the key to misunderstanding the error of his claim. > There is no signal, no matter how humongous, that can't be rendered difficult or > impossible to detect by adding enough noise (often in the form of the noise > inherent in small sample sizes). *That doesn't make the signal smaller.* > $H has so much noise added to it that it's very hard to nail down. That > absolutely, positively does not mean that differences between $H talents must be > tiny. > Here it is in a nutshell: > Maddux with Atlanta (excluding '79) is .271, Glavine is .282.
Eric M.Van <em...@post.harvard.edu> wrote: > That does not change the fact that the sample mean is the *likeliest* mean for > the population. Which is all I asserted.
Well then I think you're begging the question. If researches over the years have concluded that everyone in the world is equally as talented at tossing widgets as everybody else, and they peg that talent at an average score of 65, and one guy tosses five widgets for an average of 75, the most likely level of his talents is _still_ 65, since his results could easily be explainable by chance and there is as of yet no information indicating that anyone has an ability of 75.
Eric M.Van <em...@post.harvard.edu> wrote: >1) Much of the reason why $H varies so much from year to year, where K% and BB% >don't, is *because the sample size is smaller*. Too small. Note that HR%, >which has a very similar sample size, also has a large variance from year to >year. The additional lack of correlation in $H can be attributed to larger >team defense and park effects.
The statement should probably be qualified. The sample size is not really small in absolute terms, but small because the weak effect we're trying to detect is intermingled with other more significant effects.
>It's trivial to simulate multiple seasons of $H in Microsoft Excel for a pitcher >with any theorized innate level of $H. The results are eye-opening.
>Put this formula in cell A1:
>=IF(RAND()>0.271,0,1)
>where the 0.271 is whatever $H rate you want to simulate (that happens to be an >estimate for Maddux).
>Now copy the formula down to the next 650 cells (650 being a typical number of >balls in play for a pitcher with 150+ IP).
>In cell B1, put:
>=SUM(A1:A650)/650
>Now hit F9 repeatedly to regenerate the random numbers. Cell B1 will report >each season's $H number. You will see quite a bit of variation. I'm doing >this in real time:
>Obviously this CyberGreg learned to pitch after those first three mediocre >seasons! And he must have come up with some adjustment after the .314 to go >back to .257, which started another string of vintage years.
>(In case you miss the point, the previous paragraph is total nonsesnse, although >total nonsesnse of the sort our brain is hard-wired to come up with.)
>(Lowering the sample size just a bit will increase the variance noticeably.)
>He was "better" than CyberGreg in years 1, 2, 3, 5, and 10.
>Any trained scientist would dismiss the notion that there were any difference >between these two pitchers. Over 15 seasons, CybeGreg averaged .277 +/- .018, >CyberTom .286 +/- .017. That has a 16% chance of happening at random. And yet >we *know* the pitchers *are* different!
>(OTOH, the scientist would also dismiss the *obvious* fact -- fact! -- look, >it's obvious in the numbers! -- that CyberTom was the better pitcher for the >first 5 years, until CyberGreg developed his mastery. And the scientist would >be correct: CyberGreg was just *unlucky* for 5 years.)
Right. You can go from a theorized notion of their actual ability to actual numbers, but you need a lot more data to go the other way.
>In reality, of course, the different levels of team defense behind the pitcher >cause an even greater fluctuation from year-to-year. Two years ago the real >Greg and Tom had their $H's go up hugely, to .331 and .318. If you include '99 >and ask, what are the odds that this string of $H numbers could happen by >chance, you basically get No for an answer (2 and 4% chance, respectively). >Removing '99 returns the odds on Maddux to 72%, and Glavine to 20%. IOW, with >the exception of one season, you don't have to invoke *anything* beyond random >variation to explain the fluctuation in their $H numbers in their career >together, and in Maddux's case the variation is particularly trivial.
On top of that, one's true ability probably doesn't stay constant throughout one's career.
>So: because of small sample sizes, differences in pitcher $H are *very* hard to >detect statistically. But that doesn't mean they're non-existent or even >trivial in size. The apparent difference between Maddux and Glavine over their >joint career in Atlanta, in terms of ERA, is about 0.20. That's not trivial. >Give me that every day and I'll finish 3 games ahead of you in the strandings. >My rough guess / gut feeling is that there are pitchers who shave up to 0.25 off >their ERA and others who lose about 0.25, because of their innate $H ability. >Maybe even more.
I'm pretty sure that there are a lot of pitchers in the NL whose hitting ability makes a far greater difference. (which is also hard to quantify because of the small sample size). Not many seem to pay much attention though. And the kind of difference that isn't significant over a 10-year span is virtually undetectable. A very smart projection system can probably incorporate this to some level though.
>The last term above, the apparent -0.76 fudge factor (that's .076 of a standard >deviation, about .0017 in raw $H terms) is very interesting. Pitchers who >change teams *unquestionably* have a reduction in $H, that is independent of >everything else, from an average of .280 to .276. It will be very interesting >to see if this varies by era. I have only one weak idea why this happens: >change-of-scene psychology? I'm not convinced.
I'm guessing that those who underperform in $H *tend* to be traded.
>3) So how should DIPS work? We shouldn't throw out $H entirely. We can do two >things: we can compare $H to team $H without that pitcher, and we can look at >lifetime $H. Combining both approaches, we can look at lifetime $H relative to >team. This can then be modified by park and manager effects (when known) to get >a true $H estimate. This might be very valuable for predicting future pitcher >performance.
For a projection system, how much difference from the mean can be *expected* for any individual pitcher? It's one thing to say that some pitchers are much better at this than others. It's another to find such pitchers with reasonable confidence. I'm sure it's possible to incorporate this, but I doubt this will change anyone's projected ERA by more than .05.
>5) Pedro Martinez went from a true $H of .325 in 1999 to one of .237 in 2000. >His HR rate, however, nearly doubled. The odds of this all happening by chance >are about 1 in 286. Conclusion: if anyone could refine their pitching approach >in a way that reduced how often batters hit the ball hard, at the expense of >some extra homers, it's Pedro. I think his true $H actually did go down last >year.
Well, it's more of a conjecture than a conclusion.
I said this before in another NG but note that this ability to hit balls hard differs significantly from one batter to another. That means if two pitchers allow balls in play at different rates to different hitters (or different types of hitters), then you will see a difference in their $H ability level.
-- Capitalism is the extraordinary belief that the nastiest of men, for the nastiest of reasons, will somehow work for the benefit of us all. -- John Maynard Keynes
> Dale Hicks <dgh1...@bellspamlesssouth.net> wrote: > > Eric M.Van <em...@post.harvard.edu> wrote in article <3A524E87.E7267...@post.harvard.edu>...
> >> So: because of small sample sizes, differences in pitcher $H are *very* hard to > >> detect statistically. But that doesn't mean they're non-existent or even > >> trivial in size.
> > Do you concur, Voros? You said that any ability was below > > the noise floor, IIRC.
> Basically. Essentially I said that any observed differences in the stat > between two pitchers have thus far been difficult to ascribe to ability.
> > He says this, then says the effect isn't trivial.
> I suppose it depends on your definition of "trivial". It's kind of like > building a compass from cork, needle, water and magnet and then once built > arguing whether it's dead on north or 0.4 degrees off.
> >> The apparent difference between Maddux and Glavine over their > >> joint career in Atlanta, in terms of ERA, is about 0.20. That's not trivial.
> > So their BB, K, and HR rates are equivalent, and only the $H > > makes up that difference?
> No. I think he was talking about $H affect on it, which is a tough > argument. Since joining the Braves Maddux' rate is .275 and Glavine's is > .282. If you count that whole difference as completely attributable to > each's ability (tough since over the whole career of each the numbers are > .278 for Maddux and .279 for Glavine), you get a little over 5 hits a > year. To make a .20 difference, you'd need a little over 5 runs.
It's about 7.4 hits a year, and 7.4 hits turned into outs will turn into 6.17 runs.
> If the difference between the two in $H giving full credit for $H doesn't > account for 0.20 in ERA, I don't see where you can make that statement.
> Eric M.Van <em...@post.harvard.edu> wrote: > > Voros wrote:
> >> Eric M.Van <em...@post.harvard.edu> wrote: > >> > 3) So how should DIPS work? We shouldn't throw out $H entirely.
> >> Yes we should, actually. It stands for defense independent pitching > >> stats. If we include hits in play, it's no longer defense independent, no > >> matter how we choose to estimate the defenses contribution. It goes from > >> DIPS to PS.
> > Of course you're right. What I meant to say, was, what's the role of DIPS? > > It's a starting place. But you don't get the complete picture until you look at > > $H, and look at it as meaningful.
> >> The point I'm making is that there's a price to be paid by multiplying > >> things by a series of weights and balances. The numbers cease to mean > >> anything other than a best guess as to how good the pitcher was.
> > Right. DIPS separates the part we're fairly certain about from a part that we > > have difficulty measuring. But what the hell is wrong about a best guess of > > something we can't know for certain? Especially if it can add or subtract 0.25 > > or 0.40 or whatever to an ERA?
> For starters, they don't mean anything. What would any ERA derived from > this mean?
> Also I'm very much disputing that the difference for ERA is that high, and > if it does it's in extremly rare cases.
> Finally I think the error of such estimates is much larger than the > predicted differences.
> >> IOW, even if pitchers had a meaningful ability to affect hits per balls > >> in play (which they don't),
> > Ahh, Voros . . I just proved that they do, to a rigor 34,202 times as great as > > I would need to get it published in a scientific journal.
> You really haven't. You've shown only and exactly that there's a small > correlation that is less likely to be explained by park and defensive > biases than a study including all pitchers.
The likelihood of a correlation being real is *not the same* as its size. Yes, they tend to track together. But when you look at a correlation to judge its validity, you ignore the strength. You look at the odds of it happening by chance. If they are very large, it's absurd to start looking for alternate explanations.
To your confusion between the size of a correlation and the variance in the measure being correlated, you are now adding an equally deadly confusion between a correlation's size and its significance.
you haven't ruled out other
> explanations, and when the correlation in seasons is around .10 I think > maybe you should.
The size of the correlation is meaningless here.
Hell as Russell pointed out, I get a correlation of .07
> with 1000 samples between the ASCII code of the first letter of the city > the pitcher pitched in and his $H rate that year. That's certainly not > likely to happen from chance, but it sure as hell is likely to have > happened from something that is effectively chance.
I found a relationship among new $H, new teammates' $H, and old $H that has 1 chance in 1,549,029,624,821,830,000,000,000 of happening by chance. That's 1.55 septillion, for you folks scoring at home.
The correlation with teammates $H has 1 chance in 739,236,817,630,109,000 of happening at random. 739 Quadrillion.
The correlation with old $H has once chance in 684,050 of happening by chance.
I'm not going to lose any sleep wondering what else might have caused that to happen in *this* universe but not in the 684,049 alternate ones (in which not one, mind you, did the ball go through Buckner's legs.)
> The fact is you have a correlation where the the difference between the > highest predicted value and the lowest one is a good bit smaller than the > standard error of all of those predicitions.
> When a pitcher changes > > teams, his new $H depends on his old $H. Period. End of story. No possibility > > of other explanation.
> For starters, this is _never_ true. There's always a possibility of other > explanations.
> > It's as cut and dried as possible.
> Cut and dried as possible would be a correlation of say .98, not one of > .10.
Wrong, wrong, wrong, wrong, wrong. A correlation of .98 can be totally meaningless if the sample size is small enough (say, 2). A correlation of .02 can be the word of God if the sample size is large enough. And for some things (contribution of a gene to a complex disease, for instance), it could be crucial information.
> In my $H figures for pitchers who changed teams, the correlation between > the $H rate the second year and the year of the first year was .142.
And did you test that correlation for significance?
> > So for the 1000+ pitchers who changed teams, the year they pitched in was > a better indicator of $H the next year than $H was.
Who said that determining $H was a winner-take-all proposition?
> > It has *never* followed that because $H correlates poorly, differences between > > it must be small.
> It's a data point. The fact that the difference in active career $H rates > among active pitchers resemble the same range as you'd expect from chance > is another. The fact that a study of similar pitchers (done three separate > times with different groups) for whom all the stats are similar excpet > hits produced a result in which the high $H group gave up the same number > of hits the next year as the low $H group is another. The fact that > forecasts from $H totals yield higher errors than the predicted difference > between the worst and best prediction is another.
Well, those are all thought-provking (although I'm not sure I can parse the last one). I'll be working on trying to put a number on the inter-pitcher variation later in the month.
> >The last term above, the apparent -0.76 fudge factor (that's .076 of a standard > >deviation, about .0017 in raw $H terms) is very interesting. Pitchers who > >change teams *unquestionably* have a reduction in $H, that is independent of > >everything else, from an average of .280 to .276. It will be very interesting > >to see if this varies by era. I have only one weak idea why this happens: > >change-of-scene psychology? I'm not convinced.
> I'm guessing that those who underperform in $H *tend* > to be traded.
Of course. Duh. Have an unlucky year, you're slightly more likely to change teams.
> >3) So how should DIPS work? We shouldn't throw out $H entirely. We can do two > >things: we can compare $H to team $H without that pitcher, and we can look at > >lifetime $H. Combining both approaches, we can look at lifetime $H relative to > >team. This can then be modified by park and manager effects (when known) to get > >a true $H estimate. This might be very valuable for predicting future pitcher > >performance.
> For a projection system, how much difference from the > mean can be *expected* for any individual pitcher? It's > one thing to say that some pitchers are much better at > this than others. It's another to find such pitchers > with reasonable confidence. I'm sure it's possible to > incorporate this, but I doubt this will change anyone's > projected ERA by more than .05.
The impact on projections will come from understanding park, manager, and teammate effects. Right now I'm agnostic about the maximum size of the effect. But my simulation showed that we *couldn't tell the difference*, even after 15 years, with a true effect that would be worth 0.25.
> >5) Pedro Martinez went from a true $H of .325 in 1999 to one of .237 in 2000. > >His HR rate, however, nearly doubled. The odds of this all happening by chance > >are about 1 in 286. Conclusion: if anyone could refine their pitching approach > >in a way that reduced how often batters hit the ball hard, at the expense of > >some extra homers, it's Pedro. I think his true $H actually did go down last > >year.
> Well, it's more of a conjecture than a conclusion
I *always* present my conjectures as conclusions <g>. Otherwise they think you're a wuss. .
> I said this before in another NG but note that this > ability to hit balls hard differs significantly from > one batter to another. That means if two pitchers > allow balls in play at different rates to different > hitters (or different types of hitters), then you > will see a difference in their $H ability level.
Eric M.Van <em...@post.harvard.edu> wrote: >Voros wrote: >> It's a data point. The fact that the difference in active career $H rates >> among active pitchers resemble the same range as you'd expect from chance >> is another. The fact that a study of similar pitchers (done three separate >> times with different groups) for whom all the stats are similar excpet >> hits produced a result in which the high $H group gave up the same number >> of hits the next year as the low $H group is another. The fact that >> forecasts from $H totals yield higher errors than the predicted difference >> between the worst and best prediction is another. >Well, those are all thought-provking (although I'm not sure I can parse the >last one). I'll be working on trying to put a number on the inter-pitcher >variation later in the month.
I think we are reaching two conclusions here:
#1. It appears that there's *some* variance in pitchers' ability to prevent hits on balls in play.
#2. It also appears that it's practically impossible to tell a pitcher's $H ability-level from his $H alone.
The next step, then, is to look for other factors that could tell us the pitcher's true $H better than using his actual $H alone. Batters faced, Batters to whom the pitcher allowed contact but not a homerun, GB/FB, HR/FB, Linedrives/GB+FB, Rate of hits on LD, on GB, on FB, etc.
-- VMS, n.: The world's foremost multi-user adventure game.
Eric M.Van <em...@post.harvard.edu> wrote: > Voros wrote:
>> Dale Hicks <dgh1...@bellspamlesssouth.net> wrote: >> > Eric M.Van <em...@post.harvard.edu> wrote in article <3A524E87.E7267...@post.harvard.edu>...
>> >> So: because of small sample sizes, differences in pitcher $H are *very* hard to >> >> detect statistically. But that doesn't mean they're non-existent or even >> >> trivial in size.
>> > Do you concur, Voros? You said that any ability was below >> > the noise floor, IIRC.
>> Basically. Essentially I said that any observed differences in the stat >> between two pitchers have thus far been difficult to ascribe to ability.
>> > He says this, then says the effect isn't trivial.
>> I suppose it depends on your definition of "trivial". It's kind of like >> building a compass from cork, needle, water and magnet and then once built >> arguing whether it's dead on north or 0.4 degrees off.
>> >> The apparent difference between Maddux and Glavine over their >> >> joint career in Atlanta, in terms of ERA, is about 0.20. That's not trivial.
>> > So their BB, K, and HR rates are equivalent, and only the $H >> > makes up that difference?
>> No. I think he was talking about $H affect on it, which is a tough >> argument. Since joining the Braves Maddux' rate is .275 and Glavine's is >> .282. If you count that whole difference as completely attributable to >> each's ability (tough since over the whole career of each the numbers are >> .278 for Maddux and .279 for Glavine), you get a little over 5 hits a >> year. To make a .20 difference, you'd need a little over 5 runs. > It's about 7.4 hits a year, and 7.4 hits turned into outs will turn into 6.17 > runs.
I get about 5 hits. Maddux gets about 650 balls in play in an average season. 650*.007 = 4.55. The difference due to rounding was probably a little higher than .007 so it was around five hits.
>> I'm guessing that those who underperform in $H *tend* >> to be traded.
>Of course. Duh. Have an unlucky year, you're slightly more likely to change >teams.
Is it obvious? It's not obvious to me. What analytics *are* available for propensity-to- be-traded? Quick speculations: players for whom there is a divergence of beliefs about their abilities; players with specialized abilities; mediocre players ("deal-fillers"); ...
>I *always* present my conjectures as conclusions <g>. Otherwise they think >you're a wuss.
Yeah, I know about them. . . . Enormous thanks for launching this discussion. This is fascinating stuff.
Incidentally, I believe you should be more severe in trimming follow-ups. --
> Eric M.Van <em...@post.harvard.edu> wrote: > > Voros wrote:
> >> > Here it is in a nutshell:
> >> > Maddux with Atlanta (excluding '79) is .271, Glavine is .282.
> >> Excluding '79?
> > Oops, '99. I guess I wish I were 20 years younger!
> What's your justification for excluding the highest $H level Maddux has > had in 10 years?
I explained that already. First, the odds of it happening by random chance, along with all of his other seasons for the Braves, were 2%. Take it out, the odds of his remaining seasons happening by chance rise to 77%. Second, the exact same thing happens to Glavine, though less dramatically -- 4% chance with 1999, 22% without it. I may do a closer look at the whole Braves' staff in these years, but at the guesstimate level (which is all I was doing with the computer simulations) I thought it made sense to toss out the year as an egregious team defense effect.
> In article <3A52BDDE.218AE...@post.harvard.edu>, > Eric M.Van <em...@post.harvard.edu> wrote: > . > . > . > >> I'm guessing that those who underperform in $H *tend* > >> to be traded.
> >Of course. Duh. Have an unlucky year, you're slightly more likely to change > >teams. > Is it obvious? It's not obvious to me.
Only obvious as the explanation for this effect, as one of many reasons players are dealt.
> analytics *are* available for propensity-to- > be-traded? Quick speculations: players for > whom there is a divergence of beliefs about > their abilities; players with specialized > abilities; mediocre players ("deal-fillers"); > ... > >I *always* present my conjectures as conclusions <g>. Otherwise they think > >you're a wuss. > Yeah, I know about them. > . > . > . > Enormous thanks for launching this discussion. > This is fascinating stuff.
> Incidentally, I believe you should be more > severe in trimming follow-ups. > --
This is pseudo stat-headedness gone wild. Get a clue! Give up the calculator and get some real life on-the-field baseball experience. It will get you much closer to the baseball knowledge you seek than this garbage ever will.
>In article <3A524E87.E7267...@post.harvard.edu>, > "Eric M.Van" <em...@post.harvard.edu> wrote: > (Original title: Some Thoughts . . . > Later title: Some Bold, Original Thoughts. . . > But heck, if you want attention, you've got to ask for it!)
> (For newbies -- $H is the percentage of balls in play that are turned into > outs. This varies surprisingly widely from year to year for any given > pitcher, leading some to question whether the ability to get guys to hit easy > grounders and fly balls is a real ability or not. Voros McCracken has explored > this notion in detail as Defense Independent Pitching Statistics (at least I > think that's what the acronym stands for <g>)).
> 1) Much of the reason why $H varies so much from year to year, where K% and BB% > don't, is *because the sample size is smaller*. Too small. Note that HR%, > which has a very similar sample size, also has a large variance from year to > year. The additional lack of correlation in $H can be attributed to larger > team defense and park effects.
> It's trivial to simulate multiple seasons of $H in Microsoft Excel for a pitcher > with any theorized innate level of $H. The results are eye-opening.
> Put this formula in cell A1:
> =IF(RAND()>0.271,0,1)
> where the 0.271 is whatever $H rate you want to simulate (that happens to be an > estimate for Maddux).
> Now copy the formula down to the next 650 cells (650 being a typical number of > balls in play for a pitcher with 150+ IP).
> In cell B1, put:
> =SUM(A1:A650)/650
> Now hit F9 repeatedly to regenerate the random numbers. Cell B1 will report > each season's $H number. You will see quite a bit of variation. I'm doing > this in real time:
> Obviously this CyberGreg learned to pitch after those first three mediocre > seasons! And he must have come up with some adjustment after the .314 to go > back to .257, which started another string of vintage years.
> (In case you miss the point, the previous paragraph is total nonsesnse, although > total nonsesnse of the sort our brain is hard-wired to come up with.)
> (Lowering the sample size just a bit will increase the variance noticeably.)
> He was "better" than CyberGreg in years 1, 2, 3, 5, and 10.
> Any trained scientist would dismiss the notion that there were any difference > between these two pitchers. Over 15 seasons, CybeGreg averaged .277 +/- .018, > CyberTom .286 +/- .017. That has a 16% chance of happening at random. And yet > we *know* the pitchers *are* different!
> (OTOH, the scientist would also dismiss the *obvious* fact -- fact! -- look, > it's obvious in the numbers! -- that CyberTom was the better pitcher for the > first 5 years, until CyberGreg developed his mastery. And the scientist would > be correct: CyberGreg was just *unlucky* for 5 years.)
> In reality, of course, the different levels of team defense behind the pitcher > cause an even greater fluctuation from year-to-year. Two years ago the real > Greg and Tom had their $H's go up hugely, to .331 and .318. If you include '99 > and ask, what are the odds that this string of $H numbers could happen by > chance, you basically get No for an answer (2 and 4% chance, respectively). > Removing '99 returns the odds on Maddux to 72%, and Glavine to 20%. IOW, with > the exception of one season, you don't have to invoke *anything* beyond random > variation to explain the fluctuation in their $H numbers in their career > together, and in Maddux's case the variation is particularly trivial.
> So: because of small sample sizes, differences in pitcher $H are *very* hard to > detect statistically. But that doesn't mean they're non-existent or even > trivial in size. The apparent difference between Maddux and Glavine over their > joint career in Atlanta, in terms of ERA, is about 0.20. That's not trivial. > Give me that every day and I'll finish 3 games ahead of you in the strandings. > My rough guess / gut feeling is that there are pitchers who shave up to 0.25 off > their ERA and others who lose about 0.25, because of their innate $H ability. > Maybe even more.
> 2) I believe that my study of pitchers who changed teams may be the first to > show conclusively that there *is* an effect. I originally compared pitcher's > new $H to his old $H, and his old and new team's $H with him included in the > team totals. That was messy. Subtracting him out from the team totals (this > spreadsheet is 68 columns wide and growing!) gives us this formula for a traded > player's $H, once everything has been converted to z-scores (minimum 450 balls > in play, which gives us 443 pairs of seasons in the study):
> New $H = (.201 * old $H) + (.290 * new team's $H) - .076
> The significance of the old $H factor is immense -- 1 chance in 684,050 of > happening by chance!
> Combined, the two factors account for just 22.4% of his new $H.
> So, this says his new $H is based 13% on his new team $H, and 9% on his old $H.
> What determines his new team's $H? It's the average of the innate $H of all the > other pitchers -- which will *tend* to be average -- plus the effect of the > defense per se, plus the manager, plus the park effect. Now, the quality of the > other pitchers on the team doesn't actually affect him, while the other factors > do. The fact that there must be good and bad staffs for $H will thus add noise > to the relationship. This additional noise will thus weaken the observable > relationship of new $H to new team $H. Thus, the true factor for team $H > contribution is larger than 13% -- it's something that reduces to 13% when you > add the noise effect. It may be possible later to estimate the size of this > effect. I'm guessing 15 or 16%.
> The last term above, the apparent -0.76 fudge factor (that's .076 of a standard > deviation, about .0017 in raw $H terms) is very interesting. Pitchers who > change teams *unquestionably* have a reduction in $H, that is independent of > everything else, from an average of .280 to .276. It will be very interesting > to see if this varies by era. I have only one weak idea why this happens: > change-of-scene psychology? I'm not convinced.
> 3) So how should DIPS work? We shouldn't throw out $H entirely. We can do two > things: we can compare $H to team $H without that pitcher, and we can look at > lifetime $H. Combining both approaches, we can look at lifetime $H relative to > team. This can then be modified by park and manager effects (when known) to get > a true $H estimate. This might be very valuable for predicting future pitcher > performance.
> Until we do this, we won't really know how large the range of innate $H skill > is, and we won't know how much trouble to go to in order to figure it out!
> (And I can think of a heaping mess of trouble. Ask me about it if you dare.)
> 4) None of this abrogates the importance of Voros' work with DIPS. The > realization that there are very large utterly random fluctuations in ERA, and > the understanding of where exactly that comes from, so that we can say whether a > change in ERA was luck or skill -- that remains revelatory.
> 5) Pedro Martinez went from a true $H of .325 in 1999 to one of .237 in 2000. > His HR rate, however, nearly doubled. The odds of this all happening by chance > are about 1 in 286. Conclusion: if anyone could refine their pitching approach > in a way that reduced how often batters hit the ball hard, at the expense of > some extra homers, it's Pedro. I think his true $H actually did go down last > year. > -- > ---- > Eric M. Van > em...@post.harvard.edu
> ". . . from that day forward she lived happily ever after. Except for the dying > at the end. And the heartbreak in between." - Lucius Shepard.
Eric M.Van <em...@post.harvard.edu> wrote: >Cameron Laird wrote:
. . .
>> >Of course. Duh. Have an unlucky year, you're slightly more likely to change >> >teams. >> Is it obvious? It's not obvious to me.
>Only obvious as the explanation for this effect, as one of many reasons players >are dealt.
. . . Perhaps I'm taking this too seriously. It *does* interest me, though. I quite agree we can quantify "have an unlucky year". I don't trust my own impressions of quantitative propensity-to-be-traded, though; I suspect it's likely to be as imperfect as other subjective evaluations. Moreover, records exist so that we can, at least in principle, measure it.
As to "explanations" and "reasons", are we modelling the public-relations level ("He knows how to win") or what we speculate are more consequential factors (he was a filler tossed in for contractual reasons)?
I admire your work with $H. I suspect there's at least equal scope to improve our understanding of management be- havior. --
> In article <3A53C4F4.B2E62...@post.harvard.edu>, > Eric M.Van <em...@post.harvard.edu> wrote: > >Cameron Laird wrote: > . > . > . > >> >Of course. Duh. Have an unlucky year, you're slightly more likely to change > >> >teams. > >> Is it obvious? It's not obvious to me.
> >Only obvious as the explanation for this effect, as one of many reasons players > >are dealt. > . > . > . > Perhaps I'm taking this too seriously. It *does* interest > me, though. I quite agree we can quantify "have an unlucky > year". I don't trust my own impressions of quantitative > propensity-to-be-traded, though; I suspect it's likely to > be as imperfect as other subjective evaluations. Moreover, > records exist so that we can, at least in principle, measure > it.
If you do it after the fact of the trade, you have a perfect system.
> As to "explanations" and "reasons", are we modelling the > public-relations level ("He knows how to win") or what we > speculate are more consequential factors (he was a filler > tossed in for contractual reasons)?
Exactly. And the database I have is of *all* the pitchers who changed teams from one year to the next in MLB history (excluding seasons split between two teams in either year). Even with a real low cut-off of a minimum 150 balls in play with each team, you find the effect of $H going down. So we know that $H goes down for all these pitchers. Therefore, as $H varies randonly from year to year, when it's on the high side you're more likely to change teams. That just follows as a fact. You could strengthen the argument by coding each transaction -- I bet the effect disappears for FA in the modern era (which means it's probably even larger for players who are actually dealt.)
> I admire your work with $H. I suspect there's at least > equal scope to improve our understanding of management be- > havior. > --