Introducing KBO Estimated xwOBA and Estimated xwOBACON

One of the main features of the KBO Wizard is the ability to view batted ball data that I have charted from the Korean Baseball Organization (KBO) games. There’s a ton of results-based data there for you to dive into, whether you’re interested in groundball% (GB%) and line drive% (LD%) or if you’re looking for batted ball direction data.

I’ve charted about 5,000 batted balls from the KBO and, as of October 21st, a new batted ball feature is available on the KBO Wizard: Estimated xwOBA (ExwOBA) and Estimated xwOBACON (ExwOBACON).

For those unfamiliar with wOBA and xwOBA in MLB terms, here are a few good resources from FanGraphs and Baseball Savant. In essence, wOBA (weighted on-base average) is trying to measure how much a hitter contributes to their team’s offensive performance by weighting each possible outcome. xwOBA takes that a step further to look at the quality of contact and how similar batted balls performed. Baseball Savant’s formula for xwOBA is not public, but we know it is some combination of Exit Velocity (EV), Launch Angle (LA), and Sprint Speed.

Since I’ve been measuring contact quality from the KBO, I wanted to incorporate Estimated xwOBA into the KBO Wizard as another means of evaluation. This article will breakdown how I arrived at my calculation for Estimated xwOBA and how to apply it through the KBO Wizard.

STEP 1: wOBA Weights

wOBA flash
The formula for wOBA from FanGraphs

The calculation for wOBA is relatively straightforward but requires weights for each outcome. You can find MLB weights here. For KBO weights, I calculated backward from the KBO wOBA values that FanGraphs has calculated on their leaderboards.

After reading in the basic KBO stats and advanced KBO stats CSVs, I needed to reorient the wOBA function into something that I could have R solve for, given the 6 unknown variables. With some basic algebra, I multiplied the denominator against the wOBA value for each play. From there, I created a matrix of the components of wOBA as one variable and the denom*wOBA as the result and let R work its magic with the Solve function from the limSolve package.

It worked perfectly and resulted in these wOBA weights:

LeaguewBBwHBPw1Bw2Bw3BwHR
KBO0.7470.8230.9281.2421.6661.942

STEP 2: Calculating Estimated xwOBA by BBE type

Next up is figuring out what the Estimated xwOBA is by what type of batted ball and strength it was hit at. Here’s where the constraints of my data come into play; as a person, not a robot tracking system like Trackman, I only keep track of three batted variables: Contact (hit hard, medium, or soft), batted ball type (ground ball, fly ball, line drive, or pop up), and direction (pull, straight, opposite), all of which are a tad subjective (again, not a robot), but I believe remain consistent over every player.

For xwOBA, I decided to just contact strength and batted ball type in my calculation.  

First, I figured out how often every single batted ball result (1B, 2B, 3B, HR, FO, GO) occurred for each contact-batted ball combo. I then multiplied that percentage of occurrences by the above mentioned wOBA weights (outs count as 0), which returned a weighted average wOBA value for each combo.

Now, there are some interesting takeaways from these values. Hard-hit and medium line drives top the list, with hard-hit fly balls in third. But the 4th-highest Estimated xwOBA value belongs to soft line drives, which seems counter-intuitive. However, it does make sense. By my count, there have been 23 soft-hit line drives in the KBO, with 14 dropping for singles (61%). Another has gone for a triple and the other eight instances have been outs; with that context, it makes sense as soft line drives have just enough power to get to the outfield, but aren’t hit too hard to be easily caught. With such a low number of instances compared to the total BBE events, the value of softly hit line drives won’t impact our Estimated xwOBA values very much.

Going down the rest of the list, the order seems pretty accurate, at least until you get to medium-hit fly balls in second to last place, with an Estimated xwOBA value of 0.095, below softly hit pop ups. Again, seems slightly counter intuitive, but 90.3% of those fly balls have gone for fly outs, whereas line drives and hard-hit fly balls stand more of a chance of landing for hits. With that context, the low value makes sense as those usually are easy plays for outfielders.

Walks are assigned a 0.747 Estimated xwOBA, hit-by-pitches are assigned 0.823, and strikeouts are assigned a 0.000 value.

STEP 3: APPLYING xwOBA

The first thing that we are going to look at is how wOBA (from FanGraphs) compares to my Estimated xwOBA. The answer? Not great!

While there’s a slight upward trend between the two, the r-squared value of 0.2844 shows that there’s not much correlation there. Which, frankly, I expected due to the differences in the data used here. The FanGraphs data is for the entire season of the KBO whereas my data is from June-onwards and does not cover every game/outing/at-bat. As such, there were only 58 players that I could use for this comparison. The incomplete data becomes a problem as you see two notable outliers with Lee Jung-hoo and Roberto Ramos. That large difference is due to their BBE profiles and how that interacts with the values I talked about earlier for each contact-batted ball combo.

PlayerxwOBAxwOBACONSwStr%GB%FB%LD%PU%Soft%Med%Hard%
Lee Jung-hoo0.5060.5212.424.632.836.16.613.131.155.7
Roberto Ramos0.2630.43316.340.725.922.211.118.53744.4

I decided to run a correlation on each type of batted ball type and contact frequency and see how they related to Estimated xwOBA and the results were more expected. (It’s good to hit the ball HARD).

From here we see that Hard-hit% and LD% are the two biggest indicators of Estimated xwOBA, while soft-contact%, GB%, and SwStr% (as a K% proxy) drag Estimated xwOBA down the most (and are the best for pitchers). Using our Lee Jung-hoo and Roberto Ramos examples, this explains why they have such huge differences in Estimated xwOBA and wOBA.

Lee hits a lot of hard-line drives and is one of the best contact hitters in the KBO, with an SwStr% of just 2.4%. Ramos on the other hand (BBE profile), but one of the highest SwStr% in the KBO at 16.3%. The inclusion of strikeouts is partly why Aaron Brooks’ changeup, despite a 90% GB% and 49% soft contact%, has just the 10th-best xwOBA by pitch type; change it to Estimated xwOBACON (xwOBA on contact) and it grades out as the 4th best pitch (min. 35 BBEs).

On the hitting side, Lee leads by a lead margin in Estimated xwOBA due to his low strikeout%, but barely leads in Estimated xwOBACON, just a few points ahead of Na Sung-bum and Mel Rojas Jr. Park Min-woo, from NC, is a surprise in 4th place, but also rarely hits the ball softly and is another low SwStr% hitter.

For pitchers, Aaron Brooks leads in Estimated xwOBA, at just 0.253, Estimated xwOBACON at 0.288, and has 3 of the top 10 pitches in both Estimated xwOBA and Estimated xwOBACON. While imperfect and not without flaws, Estimated xwOBA, and Estimated xwOBACON in this context tell us what we expected with some surprises, which is a good indicator.

In essence, Estimated xwOBA encompasses every result, while Estimated xwOBACON will help you measure the quality of contact.