Phishing Detection

The goal of this project is to apply multilayer feedforward neural networks to phishing email detection and evaluate the effectiveness of this approach. We design the feature set, process the phishing dataset, and implement the neural network (NN) systems. We then use cross validation to evaluate the performance of NNs with different numbers of hidden units and activation functions. We also compare the performance of NNs with other major machine learning algorithms. From the statistical analysis, we conclude that NNs with an appropriate number of hidden units can achieve satisfactory accuracy even when the training examples are scarce. Moreover, our feature selection is effective in capturing the characteristics of phishing emails, as most machine learning algorithms can yield reasonable results with it.

The full report: Phishing Detection Using Neural Network.

In the summer of 2011, I was one of the three students selected from the Joint Degree program to go to University College Dublin, Ireland for an internship. I was assigned to an individual project to learn about Neural Network and apply it to the problem of phishing detection. It was my first time to touch the concept of machine learning, and I taught myself by reading books and materials online. Eventually I was able to understand the concepts, wrote some simple scripts to process the data, and built a basic NN.

In the fall of 2012, I came to Stanford, and took CS229 Machine Learning by Andrew Ng. I learned more machine learning algorithms, the theories of how they work and more rigorous evaluation methods. My project partner and I were interested in the phishing detection problem, and decided to take the very basic work I had done further with the new knowledge we learned in class.

In the next fall, I was a TA for CS229.