# Foreword¶

This case is split into two parts. The first section, Analysis, discusses the high level implications of my findings. The second section, Technical Process, goes through the step by step code that I used to generate these insights.

I've tried to keep this case understandable even if you don't know anything about fantasy basketball. So don't worry if you're not a big basketball fan!

# Analysis¶

Welcome to Moneyball for the NBA, today I'll be your Jonah Hill. We're going to analyze how we can build a fantasy basketball team that's better and cheaper than can be drafted without analytics.

But wait... Let me explain a few basics first. The premise of Fantasy Basketball is simple; each person participating in the league will select real NBA players to compose their 'fantasy team'. Fantasy teams will then compete against each other - winning and losing based on the real life performances of the NBA players on those fantasy teams. This is the story of how I broke our 2017-18 league using data analytics.

The future of every fantasy basketball team is made or broken on draft day. If you draft your team well you will be set up for a season of success, and if you draft poorly, it is nearly impossible to recover.

Typically, in fantasy sports, league members will take turns drafting players sequentially until a given number of required players is reached. But in 2017, my league decided to move away from this set up, opting instead to use a live auction for player selection. This set the foundation for the strategy that I'll lay out in this analysis.

Here's how the live auction works:

• Each league member is alloted \$200 (not real money) to draft players to create their team • Each team consists of 13 NBA players • On draft day, NBA players are nominated one by one and league members are able to bid from their \$200 pool of money to purchase that player for their team. Whoever is willing to bid the most for a player will purchase that player for their team
• The auction is live, meaning that if I bid \$50 on a player, everyone else will see my bid and will have 10 seconds to bid a higher amount before my bid purchases that player. Now that we understand how the draft works, we need to decide who we are targetting and how much we should be willing to pay for them. Well luckily, Yahoo, the largest fantasy sports platform, provides projected player values that they believe players should be bought for. They also provide the average value that players are actually being drafted for, as seen below. Yahoo's projected values are treated as the gold standard among fantasy basketball players. These values are used as a baseline for decision making during the draft. For instance, if we think a player is underrated, we may be willing to bid \$5 - \$10 higher than Yahoo's projected value. But don't take my word for that - take a look at the graph below showing the difference between Yahoo's projected values and actual draft values for each player. The average draft values follow Yahoo's projected values extremely closely. This begs the question though - if everyone is blindly basing their decisions on Yahoo's projected values - how do we know if those values are accurate? Well, we don't. We have absolutely no idea where those numbers are coming from or how they are generated. Ok, so since we don't know where the Yahoo numbers come from, I thought it might be helpful to develop a metric for scoring players and assigning them dollar values. This would allow us to compare our values against Yahoo's projections to determine if the Yahoo projections are accurate. But if I'm going to come up with my own method for valuing NBA players, I need to make sure that it is mathematically accurate. Luckily, we can use z-scores to perform a mathematical evaluation of NBA players from their previous season's statistics and then we can use those z-scores to derive a dollar value that we can compare against Yahoo's projections. If you're interested in the details of these calculations, you can see them in the Technical Process section below. Here's a quick and dirty overview of why z-scores work - skip this if you don't like math. Z-scores take the NBA average from the previous season across each relevant stat (points, rebounds, assists, etc) and then calculate how many standard deviations from the mean each player falls in each category. Then, we average their z-score across all categories to come up with a weighted score. We can then add up the weighted scores for the top 150 or so players depending on how many players will be in the league, and then apply their fraction of the total weighted values to the total number of dollars available for spending in the league. Boom - mathematically accurate player values calculated. Math, calculations, and z-scores sound pretty complicated though. And they never mention anything like that in Moneyball so why do we even need them? Well, here's what you need to know - the red line in the graph below is a mathematically accurate assessment of what players are worth. At times, that red line is far away from the green line which is what players actually cost. If the red line is above the green line, then players are mathematically worth more than what they are being sold for. We can buy those players at a discount from what they are actually worth. I know, I know, you are probably thinking that previous season stats aren't forward looking and therefore don't account for factors such as player injuries or trades. That is definitely true, shown above, Isaiah Thomas is calculated to be worth \$55.1 from his previous season stats but after injuring his hip between seasons and being traded to another team, his projected value decreased by over 50% to \$21.0. That seems justified and goes to show that we still need to examine individual player situations before assuming this data is completely accurate. But in the majority of cases there is seemingly no reason for Yahoo's projected values to be so different from the previous year. Take Toronto's two star players during the 2016-17 season, Kyle Lowry and Demar DeRozan. Both will be playing in nearly the same situation in 2017-18 and injuries are not a major factor. Is it justified to discount them by 30% and 50% respectively on their previous years values? I would argue not. Here's what we can infer from that. If we are able to identify players such as Lowry and DeRozan that can be bought at a 30% - 50% discount to the actual value they generate, then we can create a team that is 30% - 50% better than the average team. With just this info alone we should be able to place highly in our league. But that's only the tip of the iceberg. We can juice these discounts much, much, further by taking advantage of the league scoring system. Here's how the league scoring system works: • Each fantasy team will compete against one other fantasy team per week • Teams compete across nine statistical categories: points, rebounds, assists, steals, blocks, three point shots, field goal percentage, free throw percentage, and turnovers. • Let's use assists as an example, if your team collectively puts up more assists than the opposing team, then you would win that category for the week. • At the end of the week, the team that wins more categories wins that week. Draws are also possible. • At the end of the season, the team with the most weeks won wins the league. This system means that you only need to be strong in five out of the nine categories to win each week and therefore win the league. This means that we can draft our team with the intention of only winning some categories while disregarding other categories altogether. This is called punting categories and it allows us to completely alter player values. Let me give you an example, let's say that we decide to disregard three pointers and assists to instead focus on the remaining seven categories. Suddenly, 'big men', who traditionally don't shoot three point shots or generate many assist will gain significant value. This is because their lack of three point shots and assists no longer acts as a counterweight to their high rebounds and blocks. Below we can see altered player values in red if we now disregard three point shots and assists. If that was hard to follow just take a look at this red line compared to the previous red line. The distance between the red line (the calculated values) and the blue/green lines (the projected/average values) has increased. That means we can buy certain players at even larger discounts for the value they provide our team. We can see in the above graph that many of the top players lose much of their value when discounting three pointers and assists. James Harden for instance is one of the best playmakers and three point scorers in the league. His value absolutely plummets from \$67 to \$16 when we disregard his most important stats (3pts and assists). Players like Myles Turner and Jusuf Nurkic, however, become bargains when discounting those same stats. Jusuf Nurkic now has a value of almost \$30 and we can add him to our team for an average of only \$2.6. That type of value is absolutely insane to think about. With purchases like that we are able to receive over 10x the value for our dollar. If we can target players to create a full team of players like Jusuf Nurkic, our team can become significantly more likely to win than the average fantasy team. To aid in my draft I took the above data and compiled it into an excel spreadsheet for ease of use during draft day. Below you can see a screenshot of the spreadsheet including a few notes I made on players. I also added a live team tracker where I can see while drafting which categories I'm doing well in and which need to be improved. Lastly, I've added the option to select categories to punt which will impact auction values and average z-scores in the first table. Ultimately, in sports nothing is a guarantee, there are always injuries and trades that can change player value mid season, or destroy a team's chances, but by using z-scores we increase our chances of winnning substantially. Much like the Oakland As in Moneyball, you don't need to spend the most money on players in order to have the most competitive team. # Technical Process¶ First let's bring in our raw data from a CSV file downloaded from BasketballReference.com. Let's do that using the Python Pandas module. In [1]: #Importing all the needed modules for this analysis import pandas as pd import regex as re from IPython.display import display_html #This function allows for displaying dataframes side by side def display_side_by_side(*args): html_str='' for df in args: html_str+=df.to_html() display_html(html_str.replace('table','table style="display:inline"'),raw=True) #Here we are reading in our raw data and displaying it df_nba = pd.read_csv("NBA_raw_data.csv", index_col=None) df_nba.drop("Rk", axis=1, inplace=True) #Formatting table displays pd.options.display.max_columns = None pd.options.display.max_rows = 10 df_nba  Out[1]: Player Pos Age Tm G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PS/G 0 Alex Abrines\abrinal01 SG 23 OKC 68 6 15.5 2.0 5.0 0.393 1.4 3.6 0.381 0.6 1.4 0.426 0.531 0.6 0.7 0.898 0.3 1.0 1.3 0.6 0.5 0.1 0.5 1.7 6.0 1 Quincy Acy\acyqu01 PF 26 TOT 38 1 14.7 1.8 4.5 0.412 1.0 2.4 0.411 0.9 2.1 0.413 0.521 1.2 1.6 0.750 0.5 2.5 3.0 0.5 0.4 0.4 0.6 1.8 5.8 2 Quincy Acy\acyqu01 PF 26 DAL 6 0 8.0 0.8 2.8 0.294 0.2 1.2 0.143 0.7 1.7 0.400 0.324 0.3 0.5 0.667 0.3 1.0 1.3 0.0 0.0 0.0 0.3 1.5 2.2 3 Quincy Acy\acyqu01 PF 26 BRK 32 1 15.9 2.0 4.8 0.425 1.1 2.6 0.434 0.9 2.2 0.414 0.542 1.3 1.8 0.754 0.6 2.8 3.3 0.6 0.4 0.5 0.6 1.8 6.5 4 Steven Adams\adamsst01 C 23 OKC 80 80 29.9 4.7 8.2 0.571 0.0 0.0 0.000 4.7 8.2 0.572 0.571 2.0 3.2 0.611 3.5 4.2 7.7 1.1 1.1 1.0 1.8 2.4 11.3 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 590 Cody Zeller\zelleco01 C 24 CHO 62 58 27.8 4.1 7.1 0.571 0.0 0.0 0.000 4.1 7.1 0.572 0.571 2.1 3.2 0.679 2.2 4.4 6.5 1.6 1.0 0.9 1.0 3.0 10.3 591 Tyler Zeller\zellety01 C 27 BOS 51 5 10.3 1.5 3.1 0.494 0.0 0.0 0.000 1.5 3.1 0.497 0.494 0.4 0.8 0.564 0.8 1.6 2.4 0.8 0.1 0.4 0.4 1.2 3.5 592 Stephen Zimmerman\zimmest01 C 20 ORL 19 0 5.7 0.5 1.6 0.323 0.0 0.0 NaN 0.5 1.6 0.323 0.323 0.2 0.3 0.600 0.6 1.3 1.8 0.2 0.1 0.3 0.2 0.9 1.2 593 Paul Zipser\zipsepa01 SF 22 CHI 44 18 19.2 2.0 5.0 0.398 0.8 2.3 0.333 1.3 2.8 0.451 0.473 0.7 0.9 0.775 0.3 2.5 2.8 0.8 0.3 0.4 0.9 1.8 5.5 594 Ivica Zubac\zubaciv01 C 19 LAL 38 11 16.0 3.3 6.3 0.529 0.0 0.1 0.000 3.3 6.2 0.536 0.529 0.8 1.3 0.653 1.1 3.1 4.2 0.8 0.4 0.9 0.8 1.7 7.5 595 rows × 29 columns Ok we've imported our data but notice how there are a few things that must be cleaned up before this data is useable. • Players that were traded mid-season currently appear as multiple rows • Player names appear with an identifier following the name, which we don't want Let's remove the duplicate rows for players that were traded teams. We will keep only the stats for the current team that players play for, rather than taking a weighted average between teams. This is because we only want a picture of the current situation of each player so that we can use those stats to make future decisions. In [2]: #Dropping duplicate values df_nba.drop_duplicates(subset="Player", keep="last", inplace=True) df_nba  Out[2]: Player Pos Age Tm G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PS/G 0 Alex Abrines\abrinal01 SG 23 OKC 68 6 15.5 2.0 5.0 0.393 1.4 3.6 0.381 0.6 1.4 0.426 0.531 0.6 0.7 0.898 0.3 1.0 1.3 0.6 0.5 0.1 0.5 1.7 6.0 3 Quincy Acy\acyqu01 PF 26 BRK 32 1 15.9 2.0 4.8 0.425 1.1 2.6 0.434 0.9 2.2 0.414 0.542 1.3 1.8 0.754 0.6 2.8 3.3 0.6 0.4 0.5 0.6 1.8 6.5 4 Steven Adams\adamsst01 C 23 OKC 80 80 29.9 4.7 8.2 0.571 0.0 0.0 0.000 4.7 8.2 0.572 0.571 2.0 3.2 0.611 3.5 4.2 7.7 1.1 1.1 1.0 1.8 2.4 11.3 5 Arron Afflalo\afflaar01 SG 31 SAC 61 45 25.9 3.0 6.9 0.440 1.0 2.5 0.411 2.0 4.4 0.457 0.514 1.4 1.5 0.892 0.1 1.9 2.0 1.3 0.3 0.1 0.7 1.7 8.4 6 Alexis Ajinca\ajincal01 C 28 NOP 39 15 15.0 2.3 4.6 0.500 0.0 0.1 0.000 2.3 4.5 0.511 0.500 0.7 1.0 0.725 1.2 3.4 4.5 0.3 0.5 0.6 0.8 2.0 5.3 ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... 590 Cody Zeller\zelleco01 C 24 CHO 62 58 27.8 4.1 7.1 0.571 0.0 0.0 0.000 4.1 7.1 0.572 0.571 2.1 3.2 0.679 2.2 4.4 6.5 1.6 1.0 0.9 1.0 3.0 10.3 591 Tyler Zeller\zellety01 C 27 BOS 51 5 10.3 1.5 3.1 0.494 0.0 0.0 0.000 1.5 3.1 0.497 0.494 0.4 0.8 0.564 0.8 1.6 2.4 0.8 0.1 0.4 0.4 1.2 3.5 592 Stephen Zimmerman\zimmest01 C 20 ORL 19 0 5.7 0.5 1.6 0.323 0.0 0.0 NaN 0.5 1.6 0.323 0.323 0.2 0.3 0.600 0.6 1.3 1.8 0.2 0.1 0.3 0.2 0.9 1.2 593 Paul Zipser\zipsepa01 SF 22 CHI 44 18 19.2 2.0 5.0 0.398 0.8 2.3 0.333 1.3 2.8 0.451 0.473 0.7 0.9 0.775 0.3 2.5 2.8 0.8 0.3 0.4 0.9 1.8 5.5 594 Ivica Zubac\zubaciv01 C 19 LAL 38 11 16.0 3.3 6.3 0.529 0.0 0.1 0.000 3.3 6.2 0.536 0.529 0.8 1.3 0.653 1.1 3.1 4.2 0.8 0.4 0.9 0.8 1.7 7.5 486 rows × 29 columns In [3]: #Next let's clean up the player names. We're going to the Python Regular Expressions module to do this. pattern_to_search = r"\\[a-z | A-Z | 0-9]*" df_nba["Player"].replace(regex=pattern_to_search, value="", inplace=True) df_nba.set_index("Player", inplace=True) df_nba.head()  Out[3]: Pos Age Tm G GS MP FG FGA FG% 3P 3PA 3P% 2P 2PA 2P% eFG% FT FTA FT% ORB DRB TRB AST STL BLK TOV PF PS/G Player Alex Abrines SG 23 OKC 68 6 15.5 2.0 5.0 0.393 1.4 3.6 0.381 0.6 1.4 0.426 0.531 0.6 0.7 0.898 0.3 1.0 1.3 0.6 0.5 0.1 0.5 1.7 6.0 Quincy Acy PF 26 BRK 32 1 15.9 2.0 4.8 0.425 1.1 2.6 0.434 0.9 2.2 0.414 0.542 1.3 1.8 0.754 0.6 2.8 3.3 0.6 0.4 0.5 0.6 1.8 6.5 Steven Adams C 23 OKC 80 80 29.9 4.7 8.2 0.571 0.0 0.0 0.000 4.7 8.2 0.572 0.571 2.0 3.2 0.611 3.5 4.2 7.7 1.1 1.1 1.0 1.8 2.4 11.3 Arron Afflalo SG 31 SAC 61 45 25.9 3.0 6.9 0.440 1.0 2.5 0.411 2.0 4.4 0.457 0.514 1.4 1.5 0.892 0.1 1.9 2.0 1.3 0.3 0.1 0.7 1.7 8.4 Alexis Ajinca C 28 NOP 39 15 15.0 2.3 4.6 0.500 0.0 0.1 0.000 2.3 4.5 0.511 0.500 0.7 1.0 0.725 1.2 3.4 4.5 0.3 0.5 0.6 0.8 2.0 5.3 I'm curious, let's check to see how our players did with our old "Points" scoring system that we used last year. This won't be totally accurate for this years scoring system but it should give us a general idea of where all the players stand. Last year we didn't use category scoring. Instead stats were assigned the following number of fantasy points (FP). Each NBA... • Point is worth 0.5 FP • Assist is worth 2 FP • Rebound is worth 1.5 FP • Block is worth 3 FP • Steal is worth 3 FP • Field goal attempt (FGA) is worth -0.45 FP • Field goal make (FG) is worth 1 FP • Three point make (3P) is worth 3 FP • Turnover is worth -2 FP • Free throw attempt (FTA) is worth -0.75 FP • Free throw make (FT) is worth 1 FP In [4]: #Creating a new column to display fantasy points generated with our old scoring system. Viewing top 10 players. pd.set_option('display.float_format', lambda x: '%.3f' % x) df_nba["Fantasy_pts"] = (df_nba["PS/G"] * .5) + (df_nba["AST"] * 2) + (df_nba["TRB"] * 1.5) + (df_nba["BLK"] * 3) + \ (df_nba["STL"] * 3) + (df_nba["3P"] * 3) + (df_nba["TOV"] * -2) + (df_nba["FGA"] * -.45) + \ (df_nba["FG"] * 1) + (df_nba["FTA"] * -.75) + (df_nba["FT"] * 1) df_nba.sort_values("Fantasy_pts", ascending=False, inplace=True) df_nba_top = df_nba.loc[df_nba.index[0:155], ["PS/G","AST", "TRB", "STL", "BLK", "3P", "TOV", "FG", "FGA", "FT","FTA", "Fantasy_pts"]] df_nba_top.head(10)  Out[4]: PS/G AST TRB STL BLK 3P TOV FG FGA FT FTA Fantasy_pts Player Russell Westbrook 31.600 10.400 10.700 1.600 0.400 2.500 5.400 10.200 24.000 8.800 10.400 55.750 James Harden 29.100 11.200 8.100 1.500 0.500 3.200 5.700 8.300 18.900 9.200 10.900 54.120 LeBron James 26.400 8.700 8.600 1.200 0.600 1.700 4.100 9.900 18.200 4.800 7.200 46.910 Kevin Durant 25.100 4.800 8.300 1.100 1.600 1.900 2.200 8.900 16.500 5.400 6.200 46.225 Stephen Curry 25.300 6.600 4.500 1.800 0.200 4.100 3.000 8.500 18.300 4.100 4.600 45.815 DeMarcus Cousins 24.400 3.900 12.400 1.500 1.100 2.100 3.600 8.400 18.500 5.500 7.100 45.750 Anthony Davis 28.000 2.100 11.800 1.300 2.200 0.500 2.400 10.300 20.300 6.900 8.600 44.715 Chris Paul 18.100 9.200 5.000 2.000 0.100 2.000 2.400 6.100 12.900 3.800 4.300 43.320 Giannis Antetokounmpo 22.900 5.400 8.800 1.600 1.900 0.600 2.900 8.200 15.700 5.900 7.700 43.210 Karl-Anthony Towns 25.100 2.700 12.300 0.700 1.300 1.200 2.600 9.800 18.000 4.300 5.200 42.900 Now we can see who the top 10 players were in our league last year! That's nice but this year we're using a much more complex scoring system, so we need a new metric to evaluate players on. Luckily the perfect metric exists to compare player value across different categories. Z-scores - that sounds complicated, I know! But let's walk through it step by step. There are 12 participants in our league that must each select 13 NBA players for their team. That means a total of 156 NBA players must be selected out of a pool of 486 players. We can take the average value and the standard deviation of the top 156 players in our league to see what the average value and standard deviation in each category should be. We can then compare each individual players output in each category to see how many standard deviations away from the average they are in each category. This is a z-score. In other words if the league average number of assists is 4.5 assists. A z-score of 0.0 would mean you average exactly 4.5 assists. You are 0 standard deviations away from the mean. A z-score of 1.0 would mean you are one standard deviation above the mean, and -1.0 would mean one standard deviation below the mean assist number. This allows us to compare categories against one another. If we wanted to see if 5 assists or 10 rebounds is more valuable, we can use z-scores to compare the two categories. In [5]: #Creating formulas to calculate z-scores for each category def calculate_simple_zscore(series, col_name, invert_values=False): zscore = (series - df_nba_top[col_name].mean()) / df_nba_top[col_name].std() if invert_values: zscore = zscore * -1 return zscore def calculate_complex_zscore(series_makes, series_attempts, makes_col_name, attempts_col_name): mean_percent_made_top = df_nba_top[makes_col_name].sum() / df_nba_top[attempts_col_name].sum() std_percent_made_top = (df_nba_top[makes_col_name] / df_nba_top[attempts_col_name]).std() mean_attempts_top = df_nba_top[attempts_col_name].mean() series_percent_made = series_makes / series_attempts series_delta_from_average = series_percent_made - mean_percent_made_top series_zscore = series_delta_from_average / std_percent_made_top series_volume_multiplier = series_attempts / mean_attempts_top series_adjusted_zscore = series_zscore * series_volume_multiplier series_adjusted_zscore.rename(f"{makes_col_name}%", inplace=True) return series_adjusted_zscore  In [6]: #Calculating z-scores simple_stats = ["PS/G", "AST", "TRB", "STL", "BLK", "3P"] inverted_stats = ["TOV"] complex_stats = [("FG", "FGA"), ("FT","FTA")] zscores_bucket = [] for col_name in simple_stats: result = calculate_simple_zscore(df_nba[col_name], col_name) zscores_bucket.append(result) for col_name in inverted_stats: result = calculate_simple_zscore(df_nba[col_name], col_name, invert_values=True) zscores_bucket.append(result) for pair in complex_stats: result = calculate_complex_zscore(df_nba[pair[0]], df_nba[pair[1]], pair[0], pair[1]) zscores_bucket.append(result) df_zscores = pd.concat(zscores_bucket, axis=1) df_zscores.head(10)  Out[6]: PS/G AST TRB STL BLK 3P TOV FG% FT% Player Russell Westbrook 2.936 3.254 1.799 1.454 -0.370 1.277 -4.189 -1.264 1.363 James Harden 2.490 3.619 0.855 1.203 -0.222 2.043 -4.543 -0.648 1.379 LeBron James 2.008 2.478 1.036 0.449 -0.075 0.402 -2.656 1.850 -1.823 Kevin Durant 1.776 0.698 0.928 0.198 1.397 0.621 -0.416 1.580 1.142 Stephen Curry 1.812 1.520 -0.453 1.957 -0.664 3.028 -1.359 -0.026 1.048 DeMarcus Cousins 1.651 0.287 2.417 1.203 0.661 0.839 -2.067 -0.277 -0.156 Anthony Davis 2.294 -0.534 2.199 0.700 2.281 -0.911 -0.652 1.101 0.320 Chris Paul 0.527 2.706 -0.271 2.460 -0.811 0.730 -0.652 0.122 0.910 Giannis Antetokounmpo 1.383 0.972 1.109 1.454 1.839 -0.802 -1.241 1.155 -0.308 Karl-Anthony Towns 1.776 -0.260 2.380 -0.808 0.956 -0.145 -0.888 1.841 0.467 Great now we can see the z-scores of each player across each category. Next what we want to do is see overall player value for each player so we can rank them against each other. Let's simply average each of the categories to get an average z-score for each player. In [7]: #Averaging z-scores and sorting best to worst. df_zscores["avg_zscore"] = df_zscores.mean(axis=1) df_zscores.sort_values("avg_zscore", ascending=False)  Out[7]: PS/G AST TRB STL BLK 3P TOV FG% FT% avg_zscore Player Kevin Durant 1.776 0.698 0.928 0.198 1.397 0.621 -0.416 1.580 1.142 0.880 Stephen Curry 1.812 1.520 -0.453 1.957 -0.664 3.028 -1.359 -0.026 1.048 0.762 Anthony Davis 2.294 -0.534 2.199 0.700 2.281 -0.911 -0.652 1.101 0.320 0.755 Kawhi Leonard 1.848 0.105 0.019 1.957 0.072 0.730 -0.298 0.466 1.388 0.699 Russell Westbrook 2.936 3.254 1.799 1.454 -0.370 1.277 -4.189 -1.264 1.363 0.696 ... ... ... ... ... ... ... ... ... ... ... Elijah Millsap -2.437 -1.264 -0.998 -2.567 -0.959 -1.459 0.999 -1.466 -0.610 -1.195 Ben Bentil -2.704 -1.493 -1.833 -2.567 -0.959 -1.459 1.824 -0.604 nan -1.224 Danuel House -2.704 -1.493 -1.724 -2.567 -0.959 -1.459 2.178 nan nan -1.247 Patricio Garino -2.704 -1.493 -1.579 -2.567 -0.959 -1.459 1.470 -0.846 nan -1.267 Andrew Bogut -2.704 -1.493 -2.087 -2.567 -0.959 -1.459 2.178 nan nan -1.299 486 rows × 10 columns Kevin Durant was the best player in 2016-17! We need to convert these z-scores now so that we now how much they are worth in a league using \$2400 across 12 teams drafting 156 players total.

We can calculate our auction values by assigning each z-score a percentage of the sum of the top 156 z-scores. Before doing that though, we must account for the fact that some z-score will be negative numbers. Let's first adjust up all our z-scores by the lowest negative number of the top 156 players. This sounds confusing but it maintains the percentage differece between our z-scores, which is used for calculating auction values. We can the drop these adjusted z-scores since they don't tell us anything useful.

If you didn't follow that just know that below we calculate our auction values.

In [8]:
#Calculating auction values based on z-scores
league_members = 12
players_per_member = 13
cash_per_member = 200

df_zscores["avg_zscore_adj"] = df_zscores["avg_zscore"] + abs(df_zscores.loc[df_zscores.index[155], ["avg_zscore"]])[0]
df_zscores["auction_value"] = (df_zscores["avg_zscore_adj"] / df_zscores.loc[df_zscores.index[0:155], ["avg_zscore_adj"]].sum()[0]) * league_members * cash_per_member
df_zscores.drop(["avg_zscore_adj"], axis=1, inplace=True)
df_zscores = df_zscores.round(2)

pd.options.display.max_rows = None
df_zscores.sort_values(["auction_value"], ascending=False).head(10)

Out[8]:
PS/G AST TRB STL BLK 3P TOV FG% FT% avg_zscore auction_value
Player
Kevin Durant 1.780 0.700 0.930 0.200 1.400 0.620 -0.420 1.580 1.140 0.880 79.850
Stephen Curry 1.810 1.520 -0.450 1.960 -0.660 3.030 -1.360 -0.030 1.050 0.760 71.230
Anthony Davis 2.290 -0.530 2.200 0.700 2.280 -0.910 -0.650 1.100 0.320 0.760 70.700
Kawhi Leonard 1.850 0.100 0.020 1.960 0.070 0.730 -0.300 0.470 1.390 0.700 66.560
Russell Westbrook 2.940 3.250 1.800 1.450 -0.370 1.280 -4.190 -1.260 1.360 0.700 66.350
James Harden 2.490 3.620 0.850 1.200 -0.220 2.040 -4.540 -0.650 1.380 0.690 65.660
Chris Paul 0.530 2.710 -0.270 2.460 -0.810 0.730 -0.650 0.120 0.910 0.640 61.950
Giannis Antetokounmpo 1.380 0.970 1.110 1.450 1.840 -0.800 -1.240 1.150 -0.310 0.620 60.660
Jimmy Butler 1.560 1.020 0.160 2.210 -0.370 -0.150 -0.300 -0.240 1.530 0.600 59.610
Karl-Anthony Towns 1.780 -0.260 2.380 -0.810 0.960 -0.150 -0.890 1.840 0.470 0.590 58.700

Now let's try punting categories and see how that impacts our ending values.

Let's try punting assists and three pointers to see if big men jump in the rankings as predicted. Pay particular attention to big men such as Anthony Davis and Karl-Anthony Towns who were ranked 3rd and 10th repectively.

In [9]:
#Creating a function that allows us to select categories to punt
def punt_cats(df, pts=False, ast=False, trb=False, stl=False, blk=False, threes=False, tov=False, fg=False, ft=False):
punt_category = {
"PS/G" : pts,
"AST" : ast,
"TRB" : trb,
"STL" : stl,
"BLK" : blk,
"3P" : threes,
"TOV" : tov,
"FG%" : fg,
"FT%" : ft
}

df.drop(["avg_zscore", "auction_value"], axis=1, inplace=True)
num_cats = sum(value == False for value in punt_category.values())
df["avg_zscore"] = 0

for cat in punt_category:
if punt_category[cat] is False:
df["avg_zscore"] += df[cat]

df["avg_zscore"] = df["avg_zscore"] / num_cats
df.sort_values("avg_zscore", ascending=False, inplace=True)

df["avg_zscore_adj"] = df["avg_zscore"] + abs(df.loc[df.index[155], ["avg_zscore"]])[0]
df["auction_value"] = (df["avg_zscore_adj"] / df.loc[df.index[0:155], ["avg_zscore_adj"]].sum()[0]) * league_members * cash_per_member
df.drop(["avg_zscore_adj"], axis=1, inplace=True)
df["auction_value"] = df["auction_value"].round(2)

punt_cats(df_zscores, ast=True, threes=True)

df_zscores.head(10)

# If you're a basketball fan you are probably wondering who Edy Tavares is at this point.
# He played 1 game last year and happened to do well in a few categories.
# We could filter out players with few games but I chose not to because then we would be screening out
# good injured players.

Out[9]:
PS/G AST TRB STL BLK 3P TOV FG% FT% avg_zscore auction_value
Player
Anthony Davis 2.290 -0.530 2.200 0.700 2.280 -0.910 -0.650 1.100 0.320 1.177 71.450
Kevin Durant 1.780 0.700 0.930 0.200 1.400 0.620 -0.420 1.580 1.140 0.944 59.930
Karl-Anthony Towns 1.780 -0.260 2.380 -0.810 0.960 -0.150 -0.890 1.840 0.470 0.819 53.710
Kawhi Leonard 1.850 0.100 0.020 1.960 0.070 0.730 -0.300 0.470 1.390 0.780 51.810
Giannis Antetokounmpo 1.380 0.970 1.110 1.450 1.840 -0.800 -1.240 1.150 -0.310 0.769 51.240
Edy Tavares -1.630 -1.040 1.550 -2.570 7.880 -1.460 -0.180 1.480 -1.680 0.693 47.490
Jimmy Butler 1.560 1.020 0.160 2.210 -0.370 -0.150 -0.300 -0.240 1.530 0.650 45.370
Hassan Whiteside 0.330 -1.170 3.030 -0.810 2.130 -1.460 -0.180 1.470 -1.520 0.636 44.670
Rudy Gobert -0.210 -0.940 2.560 -1.060 2.870 -1.460 0.060 1.970 -1.780 0.630 44.380
Myles Turner -0.120 -0.900 0.560 -0.300 2.130 -0.910 0.650 0.670 0.200 0.541 40.000

As expected Anthony Davis is now the best player and Karl-Anthony Towns jumped from 10th place to 3rd.

Awesome! We've reached our final values. But why not make this data set more useable for my friends. Let's make this dataset into an easy to use interface on Excel that we can use during draft day.

In [10]:
#Sending data into Excel
df_final = pd.concat([df_nba[["Pos", "Age", "Tm", "G", "MP"]], df_zscores], axis=1, sort=True)
df_final.to_excel("nba_stats.xlsx", sheet_name="z-scores")


Here's the end product, a table where you can easily see who the best players are. And quickly sort by whichever category you are looking to improve.

I've added a team tracker where you can see while you draft which categories you are doing well in and which you should be focusing on improving.

Lastly, I've added the option to select categories to punt which will impact the auction values and average z-scores in the first table.

To repeat the conclusion in the analysis section...

Ultimately, in sports nothing is a guarantee, there are always injuries and trades that can change player value mid season, or destroy a teams chances, but by using z-scores we increase our chances of winnning substantially. Much like the Oakland As in Moneyball, you don't need to spend the most money on players in order to have the most competitive team.