Notebook on nbviewer (original) (raw)
require 'statsample'
Statsample::Analysis.store("Statsample::Bivariate.correlation_matrix") do
It so happens that Daru::Vector and Daru::DataFrame must update metadata
like positions of missing values every time they are created.
Since we dont have any missing values in the data that we are creating,
we set Daru.lazy_update = true so that missing data is not updated every
time and things happen much faster.
In case you do have missing data and lazy_update has been set to true,
you SHOULD called #update
on the concerned Vector or DataFrame object
everytime an assingment or deletion cycle is complete.
Daru.lazy_update = true
Create a Daru::DataFrame containing 4 vectors a, b, c and d.
Notice that the clone
option has been set to false. This tells Daru
to not clone the Daru::Vectors being supplied by rnorm
, since it would
be unnecessarily counter productive to clone the vectors once they have
been assigned to the dataframe.
samples = 1000 ds = Daru::DataFrame.new({ :a => rnorm(samples), :b => rnorm(samples), :c => rnorm(samples), :d => rnorm(samples) }, clone: false)
puts "== DataFrame ==\n" IRuby.display ds.head
Calculate correlation matrix by calling the cor
shorthand.
cm = Statsample::Bivariate.correlation_matrix(ds)
puts "\n== Correlation Matrix ==\n" IRuby.display cm
Set lazy_update to false once our job is done so that this analysis does
not accidentally affect code elsewhere.
Daru.lazy_update = false end
Statsample::Analysis.run_batch