Excel-DNA: Three Stories

[ 7 ] Comments

[This blog was moved here]

Intro: Simulation

A couple of days ago I found a spreadsheet, potentially quite an interesting one. In theory, it should run a simple simulation (50000 paths by default) written in VBA – I’d say it’s several minutes of work in the worst case. However, it took slightly more. One working day, to be precise. I still have no idea what it tried to do, why it ate 97% of CPU and even what exactly it computed, because all that ended with a weird error and crashed Excel. The code is of unknown origin and protected, so no way to check it and I just gave up.

Now think about an average Excel user. He/she doesn’t care about how exactly it works, in which language written or how difficult it was for you to implement. What is important then?

functionality: everything what comes to mind can become a function;

simplicity: nobody wants to write VBA (and usually anything else too), but it’s always nice to have some useful UDFs at hand;

reliability: you have a well-tested library already, so why not to call it instead of rewriting in a poor language?

performance: why wait for a day when there’re highly optimized libraries and distributed computing is already invented?

For those who hasn’t tried it yet – check Excel-DNA. It’s an open-source project, which allows to integrate .NET into Excel. And yes, actually not only .NET 😉 Documentation, samples, links to related posts can be found on the project page.



Keep Simple Things Simple    

Let’s take log-linear interpolation as an example. Simple? Sure. We want to keep calculations clear in case if someone’d like to modify it – took 9 rows for me: functions like OFFSET and MATCH are not very friendly. And you still need to remember about sorting, extrapolation, duplicate values. What if there’re thousands of points?

Instead we call a UDF, which does all the work, it behaves in a specified way and can be reused in the future. I won’t describe how to create an addin – for more information check the project homepage. Briefly, in addition to a standard F# project we need a reference to ExcelDna.Integration.dll, interpolation.dna file with a reference to our library and a copy of ExcelDna.xll (x64 version is also available), renamed to interpolation.xll.

The future UDF has ExcelFunction attribute 1. For consistency with Excel function names it’s called LOGLINEAR. You can also add a description, category and so on. All parameters and output here are float arrays.

 1: open ExcelDna.Integration
 3: [<AutoOpen>]
 4: module Interpolation =
 6:     let private interpolate curve (point: float) =
 7:         let len = Array.length curve
 8:         let u =
 9:             if point < fst curve.[0] then 1
10:             elif point > fst curve.[len-1] then len-1
11:             else Array.findIndex (fun (v, _) -> point < v) curve
12:         let (xu, yu), (xd, yd) = curve.[u], curve.[u-1]
13:         let lnp = log yd + (log yu - log yd) / (xu - xd) * (point - xd)
14:         exp lnp
16:     [<ExcelFunction(Name="LOGLINEAR", Description="Log-Linear Interpolation",
        Category="Custom", IsThreadSafe = true)>]
17:     let loglinear (xs: _[]) (ys: _[]) points =
18:         let curve =
19:             Seq.zip xs ys
20:             |> Seq.distinctBy fst
21:             |> Seq.sort
22:             |> Seq.toArray
24:         if curve.Length < 2 then failwith "at least 2 points are required"
25:         Array.map (interpolate curve) points

After loading the add-in in Excel (double-click on interpolation.xll in output folder), you can just type LOGLINEAR and see the results! 2.



Make Complex Things Simple

Well, it’s cool, but I still can do this interpolation by hands – and it’ll work, you don’t need to be super-smart to do a couple of subtractions and multiplications without mistakes. But things are getting more interesting when a bunch of standard functions is not enough.

How about machine learning with Excel? Of course, it can be handy only for experiments with adequately small amounts of data. But for sure, not something I’d want to write in VBA.

A lot of my fellow traders use random forests for feature selection (it is actually what I like about RF most) before feeding the data into actual models, neural nets or anything else. So let’s take a look at rtp addin, which can help to find important features and get rid of useless (and potentially harmful?) ones. Important point: this example is for demo purposes only! I took some Yahoo stock prices – it’s not that easy to get good data for free, so can’t call it realistic; the same works for features 3. If you want a bit more for free, there’re also two old kaggle competitions: Benchmark Bond Trade Price Challenge  and Algorithmic Trading Challenge  (the winners used RF).

Anyway, we are lucky because all data is numeric. Even more than that – we have F# R Type Provider and can use R’s randomForest package!

 1: open RProvider
 2: open RProvider.``base``
 3: open RProvider.randomForest
 5: open ExcelDna.Integration
 7: [<ExcelFunction(Name = "IMPORTANCE", 
 8:     Description = "Variable importance measures as produced by R's randomForest ")>]
 9: let importance (names: obj[]) (values: float[,]) (ys: float[]) =
10:     let cols = min names.Length (values.GetLength 1)
11:     let rows = min ys.Length (values.GetLength 0)
13:     let xs = 
14:         names
15:         |> Seq.take cols
16:         |> Seq.mapi (fun j name ->
17:             string name, Array.init rows (fun i -> values.[i, j]))
18:         |> namedParams
19:         |> R.data_frame            
21:     let rf = R.randomForest(xs, ys)
22:     (R.importance rf).Value

Look at the R-Importance tab: we call the function {=IMPORTANCE(Names,Data,Result)} and get a nice set of values – the greater the value, the greener and more important it is. For example, P5 and P6 (close prices for day-5 and day-6 respectively) seem to be not very useful.


Adding features is pretty straightforward too. Say, we want to calculate Simple Moving Average with different offsets:

 1: [<ExcelFunction(Name = "SMA", Description="Simple Moving Average")>]
 2: let sma (values: float[]) (nobj: float) =
 3:     let n = int nobj
 4:     let len = values.Length
 5:     if len < n || n < 2 then box ExcelError.ExcelErrorNA
 6:     else 
 7:         let res = Array2D.zeroCreate len 1
 9:         Seq.windowed n values
10:         |> Seq.map Seq.average
11:         |> Seq.iteri (fun i v -> res.[i+n-1, 0] <- box v)
12:         resize res

The interesting thing here is resize function: it allows you not to select the whole output range when calling a function, but just type =SMA(Values,5) in the top cell and result array is automatically resized. The code of these examples is available on github.

Excel-DNA makes it possible to bring all .NET power to the spreadsheets. Just try it ^_^



  1. see interpolation project here
  2. as a reminder, array formulae are entered with Ctrl+Shift+Enter
  3. and also my experience has nothing to do with trading
  • Pingback: F# Weekly #31 2013 | Sergey Tihon's Blog()

  • The code is of unknown origin and protected, so no way to check it and I just gave up.


    Please don’t run code from unknown sources. That code could have busily been attempting to gain access to every system on your network for 24 hours, happily installing rootkits along the way.

    • luajalla

      Unknown here doesn’t mean randomly downloaded/untrusted, just that I don’t know who wrote it ) Most probably there’s a bug inside.

  • Pingback: F# and R in Excel | Excel-DNA()

  • niggler

    You can break VBA protection fairly easily with a hex editor: http://blog.nig.gl/post/63428658404/excel-vba-password-protection-is-useless

    • luajalla

      Thanks – was too lazy to look for that 🙂 anyway, fixing VBA is the last thing you want to do when there’s a simpler solution.

  • If you want a powerful, robust, commercial solution (integrate .NET into Excel spreadsheets, with F#) take a look at Fcell.io