Trying out Deedle with Bones and Regression

Leave a comment

[This blog was moved here]

I usually don’t need to run a regression anywhere, but it’s kind of chasing me recently, starting with the Asset Pricing class and several variations of returns regressions (signed up to look at the familiar things from a different point of view… well, I definitely succeeded: have you ever thought about drawing the returns, prices and discount factors in space, all at once? 1). But I ‘cheated’ and completed the assignments with R.

Though that was only the beginning – my cousin, MD student, was measuring the deflection of bones and other samples with different loads. And this time I decided to try out Deedle and help her to explore the data.

Hint for Mono users: if XS doesn’t load the main Deedle project, you can manually update the fsproj file (delete the reference to FSharp.Core), reload it, add references to Math.NET and FSharp.Data libs – it’ll work nicely.

Let’s start with loading experimental data. No more string splits and manual parsing!

 1: #r "FSharp.Data.dll"
 2: #load "Deedle.fsx"
 3: 
 4: open System
 5: open Deedle
 6: // load data from a csv file
 7: let ds = Frame.ReadCsv(__SOURCE_DIRECTORY__ + "/deflection.csv")
 8: val ds : Frame<int,string> = 
 9:         p         sample type length<mm> width<mm> height<mm> deflection<mm> 
10:   0  -> 0.98      bone        47         7.3       4.4        0.06 
11:   1  -> 1.96      bone        47         7.3       4.4        0.12 

Then we checked out some properties of different samples groups – say, the average sample deformation. For simplicity we’ll use only the bones group, load (“p”) and deflection columns.

 1: // choose several columns and group the data by sample type
 2: let bySample = 
 3:     ds.Columns.[["sampletype"; "p"; "deflection<mm>"]] 
 4:     |> Frame.groupRowsByString "sampletype"
 5: val bySample : Frame<(string * int),string> =
 6:                   sample type p         deflection<mm> 
 7:   bone      0  -> bone        0.98      0.06           
 8:             1  -> bone        1.96      0.12           
 9:   ...      ...    ...         ...                      
10:   duralumin 6  -> duralumin   0.98      0.03           
11:   ...      ...    ...         ...                      
12: 
13: // average deflection by sample type
14: bySample.Columns.[["sampletype"; "deflection<mm>"]] |> Frame.meanLevel Pair.get1Of2
15: val it : Frame<string,string> =
16:                deflection<mm>    
17:   bone      -> 0.228333333333333 
18:   duralumin -> 0.116666666666667 
19:   ...       -> ...               
20: 
21: // select the data for bones
22: let bones = (Frame.nest bySample).["bone"]
23: val bones : Frame<int,string> =
24:        sample type p         deflection<mm> 
25:  0  -> bone        0.98      0.06           
26:  1  -> bone        1.96      0.12           
27:  ...-> ...         ...       ...            

You may notice that some of the values in the table are missing (the handwriting can be completely unparsable!), by default they are omited, but we can always specify how we want this data to be filled using Direction or a custom function.

 1: // note that there're missing values in this dataset
 2: let deflections = bones?``deflection<mm>``
 3: val deflections : Series<int,float> =
 4:  0  -> 0.06      
 5:  1  -> 0.12      
 6:  ...-> ...       
 7:  12 -> <missing> 
 8: Series.mean deflections
 9:  val it : float = 0.2283333333 
10: // omit missing values
11: deflections |> Series.dropMissing |> Series.mean
12:  val it : float = 0.2283333333 
13: // fill missing values by copying forward
14: deflections |> Series.fillMissing Direction.Forward |> Series.mean
15:  val it : float = 0.2542857143 

Now let’s check if there’s any relation between the deflection and load. In theory, it’s supposed to be linear and we’re going to test that with a linear regression.

 1: /// Find slope and intercept with linear regression
 2: let linearRegression xs ys = (...)
 3: // drop rows with missing values
 4: let bonesreg = Frame.dropSparseRows bones
 5: val bonesreg : Frame<int,string> =
 6:         sample type p    deflection<mm> 
 7:    0 -> bone        0.98 0.06           
 8:  ... -> ...         ...  ...            
 9:    5 -> bone        5.88 0.41           
10: 
11: let load = Series.values bonesreg?p
12: let defl = Series.values bonesreg?``deflection<mm>``
13: let slope, intercept = linearRegression load defl   
14: val slope : float = 0.07142857143 
15: val intercept : float = -0.01666666667 

chart_small
Does this line make a good fit? Here is a chart with a couple of samples from this dataset.
On the other hand a classical metric like R^2 can help to answer this question too, especially when it’s extremely simple to add a new column to the dataframe and perform some operations.

The new library is tried out, the lab is completed – everyone is happy ^_^

 1: bonesreg?prediction <- intercept + slope * bonesreg?p 
 2: bonesreg?residualsq <- bonesreg?prediction - bonesreg?``deflection<mm>`` 
 3:                        |> Series.mapValues (fun x -> x*x)
 4: bonesreg
 5: val it : Frame<int,string> =
 6:        sample type p    deflection<mm> prediction         residualsq           
 7:   0 -> bone        0.98 0.06           0.0533333333333334 4.44444444444441E-05 
 8:   1 -> bone        1.96 0.12           0.123333333333333  1.11111111111111E-05 
 9:  ...-> bone        ...  ...            ...                ...                  
10: 
11: let sdvs = Frame.sdv bonesreg
12: val sdvs : Series<string,float> =
13:   sample type    -> <missing>           
14:   p              -> 1.83341211951923    
15:   deflection<mm> -> 0.131059782796503   
16:   prediction     -> 0.130958008537088   
17:   residualsq     -> 1.72132593164778E-05
18: 
19: // compute the metrics:
20: let rsquare = let x = sdvs.["prediction"] / sdvs.["deflection<mm>"] in x * x
21: val rsquare : float = 0.9984475063 
22: let df = Frame.countRows bonesreg - 2 |> float
23: val df : float = 4.0 
24: let tvalue = sqrt (rsquare / (1. - rsquare) * df)
25: val tvalue : float = 50.71981861 
26: let se = (Series.sum bonesreg?residualsq) / df |> sqrt
27: val se : float = 0.005773502692 

Yes, it’s that simple.

namespace System
namespace Deedle
val ds : Frame<int,string>Full name: Regression.ds

Multiple items
module Framefrom Deedle

——————–
type Frame<‘TRowKey,’TColumnKey (requires equality and equality)> =
interface IDynamicMetaObjectProvider
interface INotifyCollectionChanged
interface IFsiFormattable
interface IFrame
new : names:seq<‘TColumnKey> * columns:seq<ISeries<‘TRowKey>> -> Frame<‘TRowKey,’TColumnKey>
private new : rowIndex:IIndex<‘TRowKey> * columnIndex:IIndex<‘TColumnKey> * data:IVector<IVector> -> Frame<‘TRowKey,’TColumnKey>
member AddSeries : column:’TColumnKey * series:ISeries<‘TRowKey> -> unit
member AddSeries : column:’TColumnKey * series:seq<‘V> -> unit
member AddSeries : column:’TColumnKey * series:ISeries<‘TRowKey> * lookup:Lookup -> unit
member AddSeries : column:’TColumnKey * series:seq<‘V> * lookup:Lookup -> unit

Full name: Deedle.Frame<_,_>

——————–
type Frame =
static member CreateEmpty : unit -> Frame<‘R,’C> (requires equality and equality)
static member FromColumns : cols:Series<‘TColKey,Series<‘TRowKey,’V>> -> Frame<‘TRowKey,’TColKey> (requires equality and equality)
static member FromColumns : cols:Series<‘TColKey,ObjectSeries<‘TRowKey>> -> Frame<‘TRowKey,’TColKey> (requires equality and equality)
static member FromColumns : columns:seq<KeyValuePair<‘ColKey,ObjectSeries<‘RowKey>>> -> Frame<‘RowKey,’ColKey> (requires equality and equality)
static member FromColumns : columns:seq<KeyValuePair<‘ColKey,Series<‘RowKey,’V>>> -> Frame<‘RowKey,’ColKey> (requires equality and equality)
static member FromColumns : rows:seq<Series<‘ColKey,’V>> -> Frame<‘ColKey,int> (requires equality)
static member FromRecords : values:seq<‘T> -> Frame<int,string>
static member FromRecords : series:Series<‘K,’R> -> Frame<‘K,string> (requires equality)
static member FromRowKeys : keys:seq<‘K> -> Frame<‘K,string> (requires equality)
static member FromRows : rows:Series<‘TColKey,Series<‘TRowKey,’V>> -> Frame<‘TColKey,’TRowKey> (requires equality and equality)

Full name: Deedle.Frame

——————–
new : names:seq<‘TColumnKey> * columns:seq<ISeries<‘TRowKey>> -> Frame<‘TRowKey,’TColumnKey>

static member Frame.ReadCsv : path:string * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string -> Frame<int,string>
static member Frame.ReadCsv : stream:IO.Stream * ?hasHeaders:bool * ?inferTypes:bool * ?inferRows:int * ?schema:string * ?separators:string * ?culture:string -> Frame<int,string>
val bySample : Frame<(string * int),string>Full name: Regression.bySample

property Frame.Columns: ColumnSeries<int,string>
val groupRowsByString : column:’a -> frame:Frame<‘b,’a> -> Frame<(string * ‘b),’a> (requires equality and equality)Full name: Deedle.Frame.groupRowsByString

property Frame.Columns: ColumnSeries<(string * int),string>
val meanLevel : keySelector:(‘R -> ‘a) -> frame:Frame<‘R,’C> -> Frame<‘a,’C> (requires equality and equality and equality)Full name: Deedle.Frame.meanLevel

module Pairfrom Deedle

val get1Of2 : v:’a * ‘b -> ‘aFull name: Deedle.Pair.get1Of2

val bones : Frame<int,string>Full name: Regression.bones

val nest : frame:Frame<(‘R1 * ‘R2),’C> -> Series<‘R1,Frame<‘R2,’C>> (requires equality and equality and equality)Full name: Deedle.Frame.nest

val deflections : Series<int,float>Full name: Regression.deflections

Multiple items
module Seriesfrom Deedle

——————–
type Series<‘K,’V (requires equality)> =
interface IFsiFormattable
interface ISeries<‘K>
new : pairs:seq<KeyValuePair<‘K,’V>> -> Series<‘K,’V>
new : keys:seq<‘K> * values:seq<‘V> -> Series<‘K,’V>
new : index:IIndex<‘K> * vector:IVector<‘V> * vectorBuilder:IVectorBuilder * indexBuilder:IIndexBuilder -> Series<‘K,’V>
member Aggregate : aggregation:Aggregation<‘K> * observationSelector:Func<DataSegment<Series<‘K,’V>>,KeyValuePair<‘TNewKey,OptionalValue<‘R>>> -> Series<‘TNewKey,’R> (requires equality)
member Aggregate : aggregation:Aggregation<‘K> * keySelector:Func<DataSegment<Series<‘K,’V>>,’TNewKey> * valueSelector:Func<DataSegment<Series<‘K,’V>>,’R> -> Series<‘TNewKey,’R> (requires equality)
member Append : otherSeries:Series<‘K,’V> -> Series<‘K,’V>
member AsyncMaterialize : unit -> Async<Series<‘K,’V>>
override Equals : another:obj -> bool

Full name: Deedle.Series<_,_>

——————–
type Series =
static member ofNullables : values:seq<Nullable<‘a0>> -> Series<int,’a0> (requires default constructor and value type and ‘a0 :> ValueType)
static member ofObservations : observations:seq<‘a0 * ‘a1> -> Series<‘a0,’a1> (requires equality)
static member ofOptionalObservations : observations:seq<‘K * OptionalValue<‘a1>> -> Series<‘K,’a1> (requires equality)
static member ofValues : values:seq<‘a0> -> Series<int,’a0>

Full name: Deedle.FSharpSeriesExtensions.Series

——————–
new : pairs:seq<Collections.Generic.KeyValuePair<‘K,’V>> -> Series<‘K,’V>
new : keys:seq<‘K> * values:seq<‘V> -> Series<‘K,’V>
new : index:Indices.IIndex<‘K> * vector:IVector<‘V> * vectorBuilder:Vectors.IVectorBuilder * indexBuilder:Indices.IIndexBuilder -> Series<‘K,’V>

val mean : series:Series<‘K,’V> -> ‘V (requires equality and member ( + ) and member DivideByInt and member get_Zero)Full name: Deedle.Series.mean

val dropMissing : series:Series<‘K,’T> -> Series<‘K,’T> (requires equality)Full name: Deedle.Series.dropMissing

val fillMissing : direction:Direction -> series:Series<‘K,’T> -> Series<‘K,’T> (requires equality)Full name: Deedle.Series.fillMissing

type Direction =
| Backward = 0
| Forward = 1Full name: Deedle.Direction

Direction.Forward: Direction = 1
val linearRegression : xs:seq<float> -> ys:seq<float> -> float * floatFull name: Regression.linearRegression

Find slope and intercept with linear regression

val xs : seq<float>
val ys : seq<float>
let x, y, xx, xy, n =
Seq.zip xs ys
|> Seq.fold (fun (xsum, ysum, xxsum, xysum, n) (x, y) ->
xsum+x, ysum+y, xxsum+x*x, xysum+x*y, n+1.) (0.,0.,0.,0.,0.)
let slope = (n * xy – x * y) / (n * xx – x * x)
let intercept = (y – slope * x) / n
slope, intercept
val bonesreg : Frame<int,string>Full name: Regression.bonesreg

val dropSparseRows : frame:Frame<‘R,’C> -> Frame<‘R,’C> (requires equality and equality)Full name: Deedle.Frame.dropSparseRows

val load : seq<float>Full name: Regression.load

val values : series:Series<‘K,’T> -> seq<‘T> (requires equality)Full name: Deedle.Series.values

val defl : seq<float>Full name: Regression.defl

val slope : floatFull name: Regression.slope

val intercept : floatFull name: Regression.intercept

val mapValues : f:(‘T -> ‘R) -> series:Series<‘K,’T> -> Series<‘K,’R> (requires equality)Full name: Deedle.Series.mapValues

val x : float
val sdvs : Series<string,float>Full name: Regression.sdvs

val sdv : frame:Frame<‘R,’C> -> Series<‘C,float> (requires equality and equality)Full name: Deedle.Frame.sdv

val rsquare : floatFull name: Regression.rsquare

val df : floatFull name: Regression.df

val countRows : frame:Frame<‘R,’C> -> int (requires equality and equality)Full name: Deedle.Frame.countRows

Multiple items
val float : value:’T -> float (requires member op_Explicit)Full name: Microsoft.FSharp.Core.Operators.float

——————–
type float<‘Measure> = float

Full name: Microsoft.FSharp.Core.float<_>

——————–
type float = Double

Full name: Microsoft.FSharp.Core.float

val tvalue : floatFull name: Regression.tvalue

val sqrt : value:’T -> ‘U (requires member Sqrt)Full name: Microsoft.FSharp.Core.Operators.sqrt

val se : floatFull name: Regression.se

val sum : series:Series<‘K,’V> -> ‘V (requires equality and member ( + ) and member get_Zero)Full name: Deedle.Series.sum

  1. to be honest, still don’t get why anyone would need that, maybe that’s the point where phd helps?