Coding Dojo: a gentle introduction to Machine Learning with F# review
Recently I organized an F# meetup in DC, and for our first event we brought in a wonderful speaker (Mathias Brandewinder) who’s topic was called: “Coding Dojo: a gentle introduction to Machine Learning with F#”.
I was certainly a little nervous about our first meetup, but a ton of great people came out: from experienced F# users, to people who had used other functional languages (like OCaml), to people with no functional experience. The goal of the meetup was to write a k-nearest neighbors classifier for a previously posted kaggle exercise to classify pixellated numbers.
Mathias introducing F#
Mathias did a great job of breaking people up into groups and then explaining what is machine learning and the criteria of the project in a surprsingly short time period. I think people were a little scared of jumping in since he only talked for about 10 to 15 minutes, but in place of a long lecture Mathias had a really well put together guided document that encouraged users to play and interact with F#.
The first step was to create an F# project and to download his fsx gist. The gist was broken down into 7 steps where each step walked a user through the basics of F# and machine learning to build their classifier. For example, one step was how to execute lines in F# interactive. Another step was explaining the map function. Another step talked about how to read a file and parse a csv. And yet another discussed distance functions and converting raw data into records.
The meetup group
In the end, if you followed his steps, in a span of under 2 hours, even a novice could end up with a fully working classifier! The classifier’s accuracy, by default, was about 94.4%. Not too bad.
I wanted to share my version of his classifer which is based off of Mathias’ well guided steps.
open System
open System.IO
type Number = { Label: string; Pixels: int[] }
let splitLine (line:String) = line.Split([|','|])
let extract file = File.ReadAllLines file |\> splitLine
let strippedHeaders (arr:'a[]) = arr.[1..]
let convertToInt (str:string) = Convert.ToInt32 str
let lineToInt arr = convertToInt arr
let linesAsInts = lineToInt
let toNum (line:int[]) = {Label = line.[0].ToString(); Pixels = line.[1..] }
let convertToNum lines = toNum lines
let dist (a:int) (b:int) = (a-b)\*(a-b)
let arrayDist = Array.map2 dist
let totalDist a b = arrayDist a b |\> Array.reduce (+)
let train file =
extract file
|\> strippedHeaders
|\> linesAsInts
|\> convertToNum
let kNNSet trainingSet pixels k =
|\> (fun i -\> (i.Label, totalDist i.Pixels pixels))
|\> Array.sortBy (fun (label, dist) -\> dist)
|\> fun sorted -\> sorted.[0..(k - 1)]
let classify trainingSet pixels k =
kNNSet trainingSet pixels k
|\> Array.toSeq
|\> Seq.groupBy (fun (label, dist) -\> label)
|\> Seq.maxBy (fun (label, items) -\> Seq.length items)
|\> fun (label, items) -\> label
let accuracy trainingSet validationSet k = (fun i -\>
let result = classify trainingSet i.Pixels k
result = i.Label) validationSet
|\> i -\> if i = true then 1 else 0)
|\> Array.sum
|\> fun sum -\> (double)sum / (double)(Array.length validationSet)
|\> fun acc -\> (int)(acc \* 100.0)
let training = train @"C:\Projects\Personal2\DcDojo\DcDojo\trainingsample.csv"
let validation = train @"C:\Projects\Personal2\DcDojo\DcDojo\validationsample.csv"
Had I written this without following his steps I probably would have inlined a lot of the simple helper functions, but I wanted to show how Mathias really brought the “start small, build big” mentality to the project. This is something that really works well in functional languages and I think all the meetup participants picked up on that.
Another meetup participant (my coworker Sam) also posted his kNN classifier, so go check it out and worked through it with a side by side C# example which was cool.
If you get a chance to see Mathias during his summer of F# tour you should! While DC was on the tail end of the trip, Boston and Detroit still are on the agenda.
Here is a youtube of a portion of the dojo: