Off the back of the popular Machine Learning hands on session at Skills Matter last month where we created a digit recognizer, last night we tackled a new dataset. Again we took a task from Kaggle’s online predictive modelling competitions. This time the data set was passenger details from the Titanic, with the task to analyse who was likely to survive.
Guided Task: http://trelford.com/titanic.zip (unblock the file, unzip to C:\titanic, load in VS2012 and run through the tasks in the titanic.fsx interactive F# script).
Kaggle provide a CSV file with the passenger details, we loaded this using FSharp.Data’s CSV provider which infers the fields and types of the data for you:
let [<Literal>] path = "C:/titanic/train.csv"
type Train = CsvProvider<path,InferRows=0>
type Passenger = Train.Row
let passengers : Passenger =
Then did some preliminary data analysis tasks looking at how well specific features predicted survival:
let females = passengers |> where female
let femaleSurvivors = females |> tally survived
let femaleSurvivorsPc = females |> percentage survived
Finally we used a provided decision tree learning algorithm for prediction:
let labels = [|"sex"; "class"|]
let features (p:Passenger) : obj = [|p.Sex; p.Pclass|]
let dataSet : obj =
[|for passenger in passengers ->
[|yield! features passenger;
yield box (p.Survived = 1)|] |]
let tree = createTree(dataSet, labels)
I used the decision tree code from the Machine Learning in Action book porting the Python implementation to F#, here’s the gist of it. The Python Tools for Visual Studio (PVTS) came in handy for checking the outputs were the same on both implementations. Mathias Brandewinder has a great article on Decision Tree classification and also Random Forest classification in F# using the same Titanic data set.
Again it was great to see a full house for the event with over 50 members in attendance:
There’s a few more pictures from the event over on the Skills Matter Facebook page :)
Check out the F#unctional Londoners meetup page for upcoming meetings, the next one is 2 weeks on F# Mobile Apps. If you’re interested in more hands on sessions with F# I’d also highly recommend the Progressive F# Tutorials in New York this September and London in October, as there is still a great early bird rate: