Smoothing and estimation

Sometimes data is noisy making it hard to read or interpret. In this example we will plot unemployment data from 1948-2010 and highlight trends using a locally weighted regression and smoothing scatterplot (LOESS).

library(ggplot2)

df <- read.csv("http://datasets.flowingdata.com/unemployment-rate-1948-2010.csv", sep=",")
head(df)
##     Series.id Year Period Value
## 1 LNS14000000 1948    M01   3.4
## 2 LNS14000000 1948    M02   3.8
## 3 LNS14000000 1948    M03   4.0
## 4 LNS14000000 1948    M04   3.9
## 5 LNS14000000 1948    M05   3.5
## 6 LNS14000000 1948    M06   3.6
dim(df)
## [1] 746   4
xaxis_labs <- seq(1, dim(df)[1], length=10)
p <- ggplot(df, aes(x=1:dim(df)[1], y=Value)) + geom_jitter(alpha=0.25, size=3) + 
  labs(title="United States Unemployment Rate, 1948-2010", x="Year", y="Percent unemployed") + 
  scale_x_continuous(breaks=xaxis_labs, labels=df$Year[xaxis_labs])
  
p + stat_smooth(method="loess", size=1, se=FALSE, span=0.7)

The figure above is produced using a span of 0.5, if we wish to generate a more tightly fit trendline we can use a smaller span.

p + stat_smooth(method="loess", size=1, se=FALSE, span=0.2)

Or if we wish to make a more robust trendline we can increase the span.

p + stat_smooth(method="loess", size=1, se=FALSE, span=0.9)