Sometimes data is noisy making it hard to read or interpret. In this example we will plot unemployment data from 1948-2010 and highlight trends using a locally weighted regression and smoothing scatterplot (LOESS).
library(ggplot2)
df <- read.csv("http://datasets.flowingdata.com/unemployment-rate-1948-2010.csv", sep=",")
head(df)
## Series.id Year Period Value
## 1 LNS14000000 1948 M01 3.4
## 2 LNS14000000 1948 M02 3.8
## 3 LNS14000000 1948 M03 4.0
## 4 LNS14000000 1948 M04 3.9
## 5 LNS14000000 1948 M05 3.5
## 6 LNS14000000 1948 M06 3.6
dim(df)
## [1] 746 4
xaxis_labs <- seq(1, dim(df)[1], length=10)
p <- ggplot(df, aes(x=1:dim(df)[1], y=Value)) + geom_jitter(alpha=0.25, size=3) +
labs(title="United States Unemployment Rate, 1948-2010", x="Year", y="Percent unemployed") +
scale_x_continuous(breaks=xaxis_labs, labels=df$Year[xaxis_labs])
p + stat_smooth(method="loess", size=1, se=FALSE, span=0.7)
The figure above is produced using a span of 0.5, if we wish to generate a more tightly fit trendline we can use a smaller span.
p + stat_smooth(method="loess", size=1, se=FALSE, span=0.2)
Or if we wish to make a more robust trendline we can increase the span.
p + stat_smooth(method="loess", size=1, se=FALSE, span=0.9)