Friday, March 1, 2013

Basic Graphing in R: Combining, Plotting and Smoothing

R Graphs from left to right: Price of  Imported Oil per Quarter 1976:2012; Price of Retail Gasoline per Quarter:1976:2012; Ratio of Retail Gas / Imported Oil per Quarter: 1976:2012 . Source: U.S. EIA : "Short-Term Energy Outlook Real and Nominal Prices, February 13, 2012
The data files, images and R Script for this blog are here. These example use R 2.14 64 bit for Windows.  Because I am neither a statistician or energy professional, the results of the following analysis will have to be taken with a "grain of salt".  The purpose of this post is to demonstrate basic use of exporting, reformatting, combining, plotting, smoothing data in R.


Finding and Importing Data

I found historical prices of United States energy consumption at the Energy Information Administration.[1] . I wanted to understand better the rise in the price of gasoline in the United States and how closely it relates to the historic rise in the market price of crude oil. I used the quarterly worksheets Crude Oil - Q and Gasoline - Q from the EIA spreadsheet  "real_prices.xls" ; data current as of February, 2013.  I found it simplest to reformat the date to  numeric columns in CSV ('comma series value') format.  I synchronized both worksheets to cover the same date range and created a simplified numeric date range using Open Office Scalc's  left and right functions to reformat the quarter dates thus sidestepping the issue of date formatting (for now). After importing the data into R with these commands:

> QTR_Imp_Oil_Price <- read.csv("ImportedOilPrice_datereformat_simple.csv") 
> QTR_Retail_Gas_Price <- read.csv("QuarterRetailGas_datereformat_simple.csv")

I then had two data frames as below. Since all the columns now contain numeric class data, 'read.csv' formats them as.numeric:


> head(QTR_Imp_Oil_Price)
  Q Year Index84 Nominal    Real
1 1 1976  0.5590 13.3500 55.3500
2 2 1976  0.5640 13.4296 55.1742
3 3 1976  0.5730 13.5194 54.6710
4 4 1976  0.5813 13.5948 54.1876
5 5 1977  0.5920 14.3847 56.3033
6 6 1977  0.6023 14.5384 55.9284
...

> head(QTR_Retail_Gas_Price)
  Q Year Index84 Nominal Real
1 1 1976  0.5590    0.60 2.49
2 2 1976  0.5640    0.60 2.48
3 3 1976  0.5730    0.63 2.53
4 4 1976  0.5813    0.63 2.50
5 5 1977  0.5920    0.64 2.49
6 6 1977  0.6023    0.66 2.53
...

For this post, we can ignore "Index84" and "Real" data columns. However, we will create a new dataframe by combining two columns from separate dataframes:


Oil_Gas_Nominal <- data.frame(QTR_Imp_Oil_Price$Nominal,QTR_Retail_Gas_Price$Nominal)
# copy to a more readable name
Oil_Gas_Nominal_Price <- Oil_Gas_Nominal

Plotting in R

The commands
  • help(plot)
  • methods(plot)
  • help(lines)
  • library(help="stats")
  • help(lowess)
help us understand the versatility of plotting in R. In the examples below I am using the plot.ts (e.g. 'plot time series' command).  Because the a dataframe has levels synchronous with time span both  x and y arguments are not needed.  Type Oil_Gas_Nominal_Price[1]  at the R console to see why. The lines function allows me to apply scatterplot smoothing to the graph.  The plot.ts function allows for x and y axis labels as well as chart type. Here type="h"  specifies histogram.  Here are some examples from the EIA derived dataframes:

require(stats)
plot.ts(Oil_Gas_Nominal_Price[1], xlab="By Quarter: 1976:2012",type="h")
lines(stats::lowess(Oil_Gas_Nominal_Price[1]))



require(stats)
plot.ts(Oil_Gas_Nominal_Price[2], xlab="By Quarter: 1976:2012", type="h")
lines(stats::lowess(Oil_Gas_Nominal_Price[2]))


This last plot shows how dividing by dataframe columns is a vector operation.

require(stats)
Ratio_Nominal <-  data.frame(Oil_Gas_Nominal[2]/Oil_Gas_Nominal[1])
plot.ts(data.frame(Ratio_Nominal),main="Retail Gas/Imported Oil",xlab="By Quarter: 1976:2012",ylab="Retail Gas/Imported Oil",type="h")
lines(stats::lowess(Ratio_Nominal))


More information on DataFrames in R:

[1] http://timhesterberg.home.comcast.net/~timhesterberg/Rpackages/TwoPackages5.pdf
[2] http://cran.r-project.org/doc/manuals/R-intro.html#Data-frames
[3] http://cran.r-project.org/web/packages/dataframe/dataframe.pdf
[4] http://www3.nd.edu/~steve/Rcourse/Lecture2v1.pdf
[5] http://www.dummies.com/how-to/content/how-to-create-a-data-frame-from-scratch-in-r.html
[6] http://www.rochester.edu/College/gradstudents/bkenkel//data/rcourse_chap03.pdf
[7] http://rwiki.sciviews.org/doku.php?id=tips:data-frames
[8] http://rwiki.sciviews.org/doku.php?id=tips:data-frames:sort


More information on Graphs in R:

[1] http://www.harding.edu/fmccown/r/
[2] http://www.statmethods.net/graphs/scatterplot.html
[3] http://stat.ethz.ch/R-manual/R-devel/library/graphics/html/plot.html
[4] http://www.cyclismo.org/tutorial/R/plotting.html
[5] http://www.sr.bham.ac.uk/~ajrs/R/r-plot_data.html
[6] http://stackoverflow.com/questions/2564258/plot-2-graphs-in-same-plot-in-r
[7] http://flowingdata.com/2012/12/17/getting-started-with-charts-in-r/

No comments: