Basketball Data Part II – Length of Career by Position

In a previous post, I showed how easy it is to use R to scrape XML tables from websites; I used the XML package to scrape some basic basketball data. In this post, I’ll explore the idea that NBA career length might vary by position. Before reviewing this data, I assumed that centers (and big men in general) would have the shortest NBA careers. My theory was that these guys were just too big to stay healthy long enough to string together a career. Let’s see what the data says:

Image

It seems like the median career length is two years for centers, guards and forwards. We can see that centers and guards tend to have longer careers than forwards in general. If we look and C-F and G-F, we can see that these players average significantly longer careers than single position players. I don’t know a lot about basketball, so it’s difficult for me to speculate why these players have longer careers. Maybe they’re so athletic that they can easily play either position and more athletic players tend to have longer careers? Maybe these players have been in the league so long that they get moved around and thus earn the “C-F” or “G-F” designation? Any theories from people who know more about basketball?

I also looked briefly at retirement age:

Image

We can see a similar trend here with centers and guards retiring later than forwards (and C-F/G-F players retiring later than all single position players). More than 75% of forwards retire from the NBA before their 30s. I’m 29 now. Good thing I’m not a forward…

Here is the code:

###### Settings
library(XML)
setwd("C:/Blog/Basketball")
 
###### URLs
url<-paste0("http://www.basketball-reference.com/players/",letters,"/")
len<-length(url)
 
###### Reading data
tbl<-readHTMLTable(url[1])[[1]]
 
for (i in 2:len)
	{tbl<-rbind(tbl,readHTMLTable(url[i])[[1]])}
 
###### Formatting data
colnames(tbl)<-c("Name","StartYear","EndYear","Position","Height","Weight","BirthDate","College")
tbl$BirthDate<-as.Date(tbl$BirthDate,format="%B %d, %Y")
 
tbl$StartYear<-as.numeric(as.character(tbl$StartYear))
tbl$EndYear<-as.numeric(as.character(tbl$EndYear))
 
tbl$Position[tbl$Position=="F-C"]<-"C-F"
tbl$Position[tbl$Position=="F-G"]<-"G-F"
tbl$Position<-factor(tbl$Position,levels=c("C","G","F","C-F","G-F"))
 
###### Career Length
tbl$LEN<-tbl$EndYear-tbl$StartYear
 
table(tbl$Position)
boxplot(tbl$LEN~tbl$Position,col="light blue",ylab="Years",xlab="Position",
	main="Length of Career by Position")
 
###### Age at Retirement
tbl$RetireAge<-tbl$EndYear-as.numeric(substr(tbl$BirthDate,0,4))
 
boxplot(tbl$RetireAge~tbl$Position,col="light blue",ylab="Retirement Age",xlab="Position",
	main="Retirement Age by Position")
 
###### Removing Currently Active Players
retired<-tbl[tbl$EndYear<2014,]
 
boxplot(tbl$LEN~tbl$Position,col="light blue",ylab="Years",xlab="Position",
	main="Length of Career by Position")
 
boxplot(tbl$RetireAge~tbl$Position,col="light blue",ylab="Retirement Age",xlab="Position",
	main="Retirement Age by Position")

Created by Pretty R at inside-R.org

Advertisements

9 thoughts on “Basketball Data Part II – Length of Career by Position

  1. I think your 2nd hypothesis RE dual-position players and longevity is close to correct. I think it’s very likely that guys who are positional “tweeners” are disproportionately likely to have above-average size for their initial position, and are thus better able to slide down a position and accommodate later-career diminishments in athleticism. (Think 2/3s like Joe Johnson and Vince Carter, 4/5s like Amar’e Stoudemire and Pau Gasol; they don’t show in your 3-part categorization, but I imagine a similar trend would hold for 3/4s, and separating these positions out would likely demonstrate an even greater gap in expected longevity between positionally pure forwards and tweeners.)

    • Trevor,
      Thanks for the insight! I think I’ll follow-up on this analysis with some exploration of how BMI is related to career length. That could be interesting (or not). Haha.

  2. Pingback: Basketball Data Part II – Length of Career by Position ← Patient 2 Earn

  3. This is pretty cool. I did something similar using survival analysis a couple of weeks backs and found results in line with yours for position, http://randomlydistributedthoughts.blogspot.com/2014/04/lifespan-of-nba-player.html.
    It is also interesting to note that it was done on data from a different source that did not have position data in the same format. I have switched to using the same source as you though for some of the newer posts and even tried to clean up the acquisition and cleaning to make it easy to get. I have a package on Github that will do some of it. You can see some discussion of that here, http://randomlydistributedthoughts.blogspot.com/2014/05/detecting-all-stars-using-cade.html

    I am also interested in what you were talking about related to collaboration around basketball data. I have lots of ideas and pieces of code to get and do various things with basketball data and was wondering if you had any thoughts on collaboration.

    • Kenny,
      I checked out your blog and I really like the survival analysis angle that you used to approach this same issue. I’d be happy to collaborate on a project in the future. Feel free to reach out to me at greenberg.jon {at} gmail {dot} com

  4. Pingback: Basketball Data Part III – BMI: Does it Matter? | Analyst At Large

  5. Very nice! I unfortunately get an error while running for (i in 2:len)
    {tbl<-rbind(tbl,readHTMLTable(url[i])[[1]])}
    the error goes:
    Error in readHTMLTable(url[i])[[1]] : subscript out of bounds
    They might have changed the website a bit. Nice tutorial anyway. Thanx

    • DD20 – the subscript out of bounds error occurs because there are no current/former NBA players with a last name starting with ‘X’. You can replace the “len” in the for loop with c(2:23,25,26) and the code will run without any issues.

  6. Pingback: Chanyong's Data Analysis » Basketball Data Part ( R-bloggers 따라 잡기)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s