mlwhiz

Turning data into insights

My Tryst With MCMC Algorithms

The things that I find hard to understand push me to my limits. One of the things that I have always found hard is Markov Chain Monte Carlo Methods. When I first encountered them, I read a lot about them but mostly it ended like this.

The meaning is normally ...

Hadoop Mapreduce Streaming Tricks and Techniques

I have been using Hadoop a lot now a days and thought about writing some of the novel techniques that a user could use to get the most out of the Hadoop Ecosystem.

Using Shell Scripts to run your Programs


I am not a fan of large bash commands. The ...

Exploring Vowpal Wabbit with the Avazu Clickthrough Prediction Challenge

In online advertising, click-through rate (CTR) is a very important metric for evaluating ad performance. As a result, click prediction systems are essential and widely used for sponsored search and real-time bidding.

For this competition, we have provided 11 days worth of Avazu data to build and test prediction models ...

Data Science 101 : Playing with Scraping in Python

This is a simple illustration of using Pattern Module to scrape web data using Python. We will be scraping the data from imdb for the top TV Series along with their ratings

We will be using this link for this:

http://www.imdb.com/search/title?count=100&num_votes=5000 ...

Download CS109 Lectures using RTMPDump

Right Now I am working on CS109 from Harvard. It is a great course but its not easy to download, that is if you dont have this script. :)

PS: You will have to install rtmpdump in your machine for this to work

In [34]:
import requests
from pattern import web ...

DICTVECTORIZER FOR ONE HOT ENCODING OF CATEGORICAL DATA

THE PROBLEM:

Recently I was working on the Criteo Advertising Competition on Kaggle. The competition was a classification problem which basically involved predicting the click through rates based on several features provided in the train data. Seeing the size of the data (11 GB Train), I felt that going with ...

Learning pyspark – Installation – Part 1

This is part one of a learning series of pyspark, which is a python binding to the spark program written in Scala.

The installation is pretty simple. These steps were done on Mac OS Mavericks but should work for Linux too. Here are the steps for the installation:

1. Download ...

Hadoop, Mapreduce and More – Part 1

It has been some time since I was stalling learning Hadoop. Finally got some free time and realized that Hadoop may not be so difficult after all. What I understood finally is that Hadoop is basically comprised of 3 elements:

  1. A File System
  2. Map – Reduce
  3. Its many individual Components. Let ...