mlwhiz

Turning data into insights

Shell Basics every Data Scientist Should know -Part I

Shell Commands are powerful. And life would be like hell without shell is how I like to say it(And that is probably the reason that I dislike windows).

Consider a case when you have a 6 GB pipe-delimited file sitting on your laptop and you want to find out ...

Behold the power of MCMC

Last time I wrote an article on MCMC and how they could be useful. We learned how MCMC chains could be used to simulate from a random variable whose distribution is partially known i.e. we don't know the normalizing constant.

So MCMC Methods may sound interesting to some ...

My Tryst With MCMC Algorithms

The things that I find hard to understand push me to my limits. One of the things that I have always found hard is Markov Chain Monte Carlo Methods. When I first encountered them, I read a lot about them but mostly it ended like this.

The meaning is normally ...

Hadoop Mapreduce Streaming Tricks and Techniques

I have been using Hadoop a lot now a days and thought about writing some of the novel techniques that a user could use to get the most out of the Hadoop Ecosystem.

Using Shell Scripts to run your Programs


I am not a fan of large bash commands. The ...

Exploring Vowpal Wabbit with the Avazu Clickthrough Prediction Challenge

In online advertising, click-through rate (CTR) is a very important metric for evaluating ad performance. As a result, click prediction systems are essential and widely used for sponsored search and real-time bidding.

For this competition, we have provided 11 days worth of Avazu data to build and test prediction models ...

Data Science 101 : Playing with Scraping in Python

This is a simple illustration of using Pattern Module to scrape web data using Python. We will be scraping the data from imdb for the top TV Series along with their ratings

We will be using this link for this:

http://www.imdb.com/search/title?count=100&num_votes=5000 ...