# Excel-DNA: Three Stories

*[This blog was moved here]*

**Intro: Simulation**

A couple of days ago I found a spreadsheet, potentially quite an interesting one. In theory, it should run a simple simulation (50000 paths by default) written in VBA – I’d say it’s several minutes of work in the worst case. However, it took slightly more. One working day, to be precise. I still have no idea what it tried to do, why it ate 97% of CPU and even what exactly it computed, because all that ended with a weird error and crashed Excel. The code is of unknown origin and protected, so no way to check it and I just gave up.

Now think about an average Excel user. He/she doesn’t care about how exactly it works, in which language written or how difficult it was for you to implement. What is important then?

– *functionality*: everything what comes to mind can become a function;

– *simplicity*: nobody wants to write VBA (and usually anything else too), but it’s always nice to have some useful UDFs at hand;

– *reliability*: you have a well-tested library already, so why not to call it instead of rewriting in a poor language?

– *performance*: why wait for a day when there’re highly optimized libraries and distributed computing is already invented?

For those who hasn’t tried it yet – check Excel-DNA. It’s an open-source project, which allows to integrate .NET into Excel. And yes, actually not only .NET ðŸ˜‰ Documentation, samples, links to related posts can be found on the project page.

**Keep Simple Things SimpleÂ Â Â Â **

Let’s take log-linear interpolation as an example. Simple? Sure. We want to keep calculations clear in case if someone’d like to modify it – took 9 rows for me: functions like `OFFSET`

and `MATCH`

are not very friendly. And you still need to remember about sorting, extrapolation, duplicate values. What if there’re thousands of points?

Instead we call a UDF, which does all the work, it behaves in a specified way and can be reused in the future. I won’t describe how to create an addin – for more information check the project homepage. Briefly, in addition to a standard F# project we need a reference to *ExcelDna.Integration.dll*, *interpolation.dna* file with a reference to our library and a copy of *ExcelDna.xll* (x64 version is also available), renamed to *interpolation.xll*.

The future UDF has `ExcelFunction`

attribute ^{1}. For consistency with Excel function names it’s called `LOGLINEAR`

. You can also add a description, category and so on. All parameters and output here are float arrays.

1: open ExcelDna.Integration 2: 3: [<AutoOpen>] 4: module Interpolation = 5: 6: let private interpolate curve (point: float) = 7: let len = Array.length curve 8: let u = 9: if point < fst curve.[0] then 1 10: elif point > fst curve.[len-1] then len-1 11: else Array.findIndex (fun (v, _) -> point < v) curve 12: let (xu, yu), (xd, yd) = curve.[u], curve.[u-1] 13: let lnp = log yd + (log yu - log yd) / (xu - xd) * (point - xd) 14: exp lnp 15: 16: [<ExcelFunction(Name="LOGLINEAR", Description="Log-Linear Interpolation", Category="Custom", IsThreadSafe = true)>] 17: let loglinear (xs: _[]) (ys: _[]) points = 18: let curve = 19: Seq.zip xs ys 20: |> Seq.distinctBy fst 21: |> Seq.sort 22: |> Seq.toArray 23: 24: if curve.Length < 2 then failwith "at least 2 points are required" 25: Array.map (interpolate curve) points

After loading the add-in in Excel (double-click on *interpolation.xll* in output folder), you can just type `LOGLINEAR`

and see the results! ^{2}.

**Make Complex Things Simple**

Well, it’s cool, but I still can do this interpolation by hands – and it’ll work, you don’t need to be super-smart to do a couple of subtractions and multiplications without mistakes. But things are getting more interesting when a bunch of standard functions is not enough.

How about machine learning with Excel? Of course, it can be handy only for experiments with adequately small amounts of data. But for sure, not something I’d want to write in VBA.

A lot of my fellow traders use random forests for feature selection (it is actually what I like about RF most) before feeding the data into actual models, neural nets or anything else. So let’s take a look at *rtp* addin, which can help to find important features and get rid of useless (and potentially harmful?) ones. Important point: this example is for demo purposes only! I took some Yahoo stock prices – it’s not that easy to get good data for free, so can’t call it realistic; the same works for features ^{3}. If you want a bit more for free, there’re also two old kaggle competitions: Benchmark Bond Trade Price ChallengeÂ and Algorithmic Trading ChallengeÂ (the winners used RF).

Anyway, we are lucky because all data is numeric. Even more than that – we have F# R Type Provider and can use R’s randomForest package!

1: open RProvider 2: open RProvider.``base`` 3: open RProvider.randomForest 4: 5: open ExcelDna.Integration 6: 7: [<ExcelFunction(Name = "IMPORTANCE", 8: Description = "Variable importance measures as produced by R's randomForest ")>] 9: let importance (names: obj[]) (values: float[,]) (ys: float[]) = 10: let cols = min names.Length (values.GetLength 1) 11: let rows = min ys.Length (values.GetLength 0) 12: 13: let xs = 14: names 15: |> Seq.take cols 16: |> Seq.mapi (fun j name -> 17: string name, Array.init rows (fun i -> values.[i, j])) 18: |> namedParams 19: |> R.data_frame 20: 21: let rf = R.randomForest(xs, ys) 22: (R.importance rf).Value

Look at the R-Importance tab: we call the function `{=IMPORTANCE(Names,Data,Result)}`

and get a nice set of values – the greater the value, the greener and more important it is. For example, P5 and P6 (close prices for day-5 and day-6 respectively) seem to be not very useful.

Adding features is pretty straightforward too. Say, we want to calculate Simple Moving Average with different offsets:

1: [<ExcelFunction(Name = "SMA", Description="Simple Moving Average")>] 2: let sma (values: float[]) (nobj: float) = 3: let n = int nobj 4: let len = values.Length 5: if len < n || n < 2 then box ExcelError.ExcelErrorNA 6: else 7: let res = Array2D.zeroCreate len 1 8: 9: Seq.windowed n values 10: |> Seq.map Seq.average 11: |> Seq.iteri (fun i v -> res.[i+n-1, 0] <- box v) 12: resize res

The interesting thing here is `resize`

function: it allows you not to select the whole output range when calling a function, but just type `=SMA(Values,5)`

in the top cell and result array is automatically resized. The code of these examples is available on github.

Excel-DNA makes it possible to bring all .NET power to the spreadsheets. Just try it ^_^

- Excel | fsharp | R
- Tweet This !

Pingback: F# Weekly #31 2013 | Sergey Tihon's Blog()

Pingback: F# and R in Excel | Excel-DNA()