Friday, March 29, 2013

Basic Data Analysis and Lattice Graphics Framework

(post under construction)
This post is very generic cruft on basic data analysis designed around combining data with :

  • cbind()
  • rbind()
  • matrix()
  • as.matrix()
  • data.frame()

and visualizing data with the lattice graphics package.

Some quick and dirty notes on learning R. I have found virtual libraries exist both on and offline for learning R.  However, I have also found that R is a peculiar and specific language. I would compare the semantics most to SQL, but somehow that comparison stops being useful quickly. Ironically, given the power of R language quantitative analysis, I have found the user really wants to get the "feel of R" inside his forearms to become useful and self-confident.  Spending time manipulating and re-organizing data is essential at each step of your curriculum in learning R. Functionally, R is a mathematical platform and benefits from domain specific packages and knowledge. But the R language is also a unique engine type with programmable limits. R does certain functionality very well. Other functionality perhaps more typical of many programming languages is simply outside the subset of R. There is art to successful use of R. There is an important 'R' mentality that only serious practice will enjoin.

This post follows from my last post on combining data for analysis. I am using BEA, BLS, and Census data to understand 20 year macro-economic flows. Some examples on how to use cbind, rbind, matrix, commands to re-organize this data are here. Below are some functions I have created to help explicate the data set ('dd'). They are slightly more concise/useful than the function 'str(dd)'. The user will recall that I have concatenated data from mulitple sets into one dataframe. I have used prefixes (BLS,PI,NS) designate "Bureau of Labor Statistics", (BEA) Personal Income, (BEA) Net (residential) Stock.

Thursday, March 21, 2013

Monday, March 18, 2013

Combining DataFrames in R Programming

The screencast below discusses combinging dataframes from disparate sources in R Programming. Full screen is probably best. The code for the screencast is below. Data files for this screencast can be found here.


This is the code that accompanies the screen cast above.

# data science exercise combing dataframes from different sources
# data science exercise using lattice graphics system


USPerInc.1992.2011 <- data.frame(read.csv("PersonalIncomeDisposition1992-2011.csv"))
USResidentialAsset.1992.2011 <- data.frame(read.csv("Current-Cost_Net_Stock_Residential_Fixed_Assets.csv"))
USEmployPop.1992.2011 <- data.frame(read.csv("BLS_Census.csv"))

USComb <- cbind(USPerInc.1992.2011[c(1,6)])
USComb <- cbind(USResidentialAsset.1992.2011[c(2,3,4,8)])

USComb <- cbind(USPerInc.1992.2011[c(1,6)])
USComb <- cbind(USComb,USResidentialAsset.1992.2011[c(2,3,4,8)])

USComb1 <- data.frame(read.csv("BLS_Census.csv"))
USComb1 <- cbind(USComb1,(data.frame(read.csv("PersonalIncomeDisposition1992-2011.csv"))))
USComb1 <- cbind(USComb1,(data.frame(read.csv("Current-Cost_Net_Stock_Residential_Fixed_Assets.csv"))))


USComb1 <- cbind(USComb1[c(-9,-20)])
matrix1 <- matrix(sapply(USComb,class))
matrix1 <- cbind(matrix1,matrix(names(USComb)))
matrix2 <- matrix(sapply(USComb1,class))
matrix2 <- cbind(matrix2,matrix(names(USComb1)))

dd <- USComb1

Monday, March 11, 2013

"Great coders are today's rock stars."

I just had to post this video on learning to code from an all star cast at


Friday, March 8, 2013

Wednesday, March 6, 2013

Basic data analysis: Part I

[Editor's note: Under construction - 03/06/2013.]
There are number of functions that will be helpful for this example. Please examine them through use of the help system (e.g. 'help(command)'):
  • read.csv()
  • head()
  • names()
  • as.numeric()
  • c()
  • data.frame() or
  • sapply()
  • class()
  • print()
  • levels()
  • droplevels()
  • subset()
  • order()
  • plot()
  • lines()

Friday, March 1, 2013

Basic Graphing in R: Combining, Plotting and Smoothing

R Graphs from left to right: Price of  Imported Oil per Quarter 1976:2012; Price of Retail Gasoline per Quarter:1976:2012; Ratio of Retail Gas / Imported Oil per Quarter: 1976:2012 . Source: U.S. EIA : "Short-Term Energy Outlook Real and Nominal Prices, February 13, 2012
The data files, images and R Script for this blog are here. These example use R 2.14 64 bit for Windows.  Because I am neither a statistician or energy professional, the results of the following analysis will have to be taken with a "grain of salt".  The purpose of this post is to demonstrate basic use of exporting, reformatting, combining, plotting, smoothing data in R.