Data Science 101 : Playing with Scraping in Python
This is a simple illustration of using Pattern Module to scrape web data using Python. We will be scraping the data from imdb for the top TV Series along with their ratings
We will be using this link for this:
http://www.imdb.com/search/title?count=100&num_votes=5000,&ref_=gnr_tv_hr&sort=user_rating,desc&start=1&title_type=tv_series,mini_series
This URL gives a list of top Rated TV Series which have number of votes atleast 5000. The Thing to note in this URL is the “&start=” parameter where we can specify which review should the list begin with. If we specify 1 we will get reviews starting from 1-100, if we specify 101 we get reviews from 101-200 and so on.
Lets Start by importing some Python Modules that will be needed for Scraping Data:
import requests # This is a module that is used for getting html data from a webpage in the text format
from pattern import web # We use this module to parse through the dtaa that we loaded using requests
Loading the data …
Keep reading with a 7-day free trial
Subscribe to MLWhiz | AI Unwrapped to keep reading this post and get 7 days of free access to the full post archives.