Notebook on nbviewer (original) (raw)

  1. gnuplotrb
  2. notebooks Notebook

Visualizing data from Daru containers

DARU (Data Analysis in RUby) is a library for storage, analysis, manipulation and visualization of data. You can find information about daru in its repository.

GnuplotRB takes from Daru::Vector or Daru::DataFrame name as dataset's title and index column as xtic. Example:

In [1]:

require 'daru' require 'gnuplotrb' include GnuplotRB include GnuplotRB::Fit

df = Daru::DataFrame.new({ Build: [312, 630, 315, 312], Test: [525, 1050, 701, 514], Deploy: [215, 441, 370, 220] }, index: ['Run A', 'Run B', 'Run C', 'Run D'] ) df[:Overall] = df[:Build] + df[:Test] + df[:Deploy] df

Out[1]:

Daru::DataFrame:33875020 rows: 4 cols: 4
Build Deploy Test Overall
Run A 312 215 525 1052
Run B 630 441 1050 2121
Run C 315 370 701 1386
Run D 312 220 514 1046

When you pass DataFrame to Plot.new it uses every column of DataFrame as a dataset with column name as dataset title:

In [2]:

from_daru = Plot.new( df, style_data: 'lines', yrange: 0..2200, xlabel: 'Number of test', ylabel: 'Time, s', title: 'Time spent to run deploy pipeline' )

Out[2]:

Gnuplot Produced by GNUPLOT 5.0 patchlevel rc2 0 500 1000 1500 2000 Run A Run B Run C Run D Time, s Number of test Time spent to run deploy pipeline Build Build Deploy Deploy Test Test Overall Overall

In [3]:

from_daru.options( style_data: 'histograms', style_fill: 'pattern border' )

Out[3]:

Gnuplot Produced by GNUPLOT 5.0 patchlevel rc2 0 500 1000 1500 2000 Run A Run B Run C Run D Time, s Number of test Time spent to run deploy pipeline Build Build Deploy Deploy Test Test Overall Overall

Datasets may be initialized both with Array or DataFrame:

In [4]:

Plot.new([df[:Overall], with: 'lines'])

Out[4]:

Gnuplot Produced by GNUPLOT 5.0 patchlevel rc2 1000 1200 1400 1600 1800 2000 2200 Run A Run B Run C Run D Overall Overall

In [5]:

rows = (1..30).map do |i| [i**2 * (rand(4) + 3) / 5, rand(70)] end df = Daru::DataFrame.rows(rows, order: [:Value, :Error], name: 'Confidence interval')

random_points = Plot.new( [df[:Value], with: 'lines', title: 'Average value'], [df, with: 'err'] )

Out[5]:

Gnuplot Produced by GNUPLOT 5.0 patchlevel rc2 -100 0 100 200 300 400 500 600 700 800 900 0 5 10 15 20 25 30 Average value Average value Confidence interval Confidence interval

ok, and now lets try to fit it with polynomial:

In [6]:

poly = fit_poly(df, degree: 5) random_points.add_dataset(poly[:formula_ds])

Out[6]:

Gnuplot Produced by GNUPLOT 5.0 patchlevel rc2 -100 0 100 200 300 400 500 600 700 800 900 0 5 10 15 20 25 30 Fit formula Fit formula Average value Average value Confidence interval Confidence interval

In [7]:

df = Daru::DataFrame.new({ a: Array.new(100) {|i| i}, b: 100.times.map{rand} }, name: 'Scatter example' )

Plot.new([df, pt: 6, ps: 1, using: '2:3'], xrange: -10..110, yrange: -0.1..1.1)

Out[7]:

Gnuplot Produced by GNUPLOT 5.0 patchlevel rc2 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 Scatter example Scatter example

In [8]:

frames = 100.times.map do |i| Plot.new([df.row[0..i], using: '2:3', pt: 6, ps: 1]) end

Animation.new(*frames, xrange: -10..110, yrange: -0.1..1.1)

Out[8]: