mlwhiz

Deep Learning, Data Science and NLP Enthusiast

Download CS109 Lectures using RTMPDump

Right Now I am working on CS109 from Harvard. It is a great course but its not easy to download, that is if you dont have this script. :)

PS: You will have to install rtmpdump in your machine for this to work

In [34]:
import requests
from pattern import web 
import subprocess
In [35]:
firsturl="http://cm.dce.harvard.edu/2014/01/14328/publicationListing.shtml"
r=requests.get(firsturl).text
In [36]:
dom=web.Element(r)
In [37]:
dom
Out[37]:
Element(tag=u'[document]')
In [38]:
#[elem.by_tag('li.list-type')[0].content for elem in dom.by_tag('ul.list-publication')]
mdict={'NOV':'11','OCT':'10','SEP':'09','DEC':'12'}
mon=[elem.content[:3] for elem in dom.by_tag("div.list-date  list-lecture")]
day=[elem.by_tag('span')[0].content for elem in dom.by_tag("div.list-date  list-lecture")]
datedict=zip( mon,day)
In [39]:
mmdd_dict=[str(mdict[elem[0]])+ str(elem[1])for elem in datedict]
In [40]:
dictlnames=['L23','L21','L22','L19','L20','L17','L18','L15','L16','L13','L14','L11','L12','L09','L10','L07','L08','L05','L06','L03','L04','L01','L02']
fdict=zip(dictlnames,mmdd_dict)
In [41]:
ffdict={}
for elem in fdict:
    lec_num,mmdd=elem
    lec_num=lec_num.replace("L","")
    url="rtmpdump -r rtmp://flash.dce.harvard.edu/bounce -C B:0 -C Z: -C S:/2014/01/14328/L"+str(lec_num)+"/14328-2013"+mmdd+"-L"+str(lec_num)+"-1-h264-av1248-16x9-852x480.mp4 -C S:BounceAPI3.0 -C N:0.000000 -C S:mp4  -y mp4:2014/01/14328/L"+str(lec_num)+"/14328-2013"+mmdd+"-L"+str(lec_num)+"-1-h264-av1248-16x9-852x480.mp4  -o lecture"+str(lec_num)+".mp4"
    if int(lec_num)>12:
        ffdict[int(lec_num)] = url

You should change range in the code below, if some of the lectures were downloaded and you dont want to script it all over again. I am downloading lectures from 13 to 23 as i have already downloaded the previous ones. The lectures are roughly 1 GB in length and if your internet connection is slow you might face a problem with break downloads.

In []:
for i in range(15,17):
    url=ffdict[i]
    print subprocess.Popen(url, shell=True, stdout=subprocess.PIPE).stdout.read()

Lets do it for labs:

In [31]:
#[elem.by_tag('li.list-type')[0].content for elem in dom.by_tag('ul.list-publication')]

mdict={'NOV':'11','OCT':'10','SEP':'09','DEC':'12'}
mon=[elem.content[:3] for elem in dom.by_tag("div.list-date  list-section")]
day=[elem.by_tag('span')[0].content for elem in dom.by_tag("div.list-date  list-section")]
datedict=zip( mon,day)
#print datedict
mmdd_dict=[str(mdict[elem[0]])+ str(elem[1])for elem in datedict]
#print mmdd_dict
dictlnames=['S10','S09','S08','S06','S05','S04','S03','S02','S01']
fdict=zip(dictlnames,mmdd_dict)
#print fdict
In [32]:
ffdict={}
for elem in fdict:
    lec_num,mmdd=elem
    #lec_num=lec_num.replace("S","")
    url="rtmpdump -r rtmp://flash.dce.harvard.edu/bounce -C B:0 -C Z: -C S:/2014/01/14328/"+lec_num+"/14328-2013"+mmdd+"-"+lec_num+"-1-h264-av1248-16x9-852x480.mp4 -C S:BounceAPI3.0 -C N:0.000000 -C S:mp4  -y mp4:2014/01/14328/"+lec_num+"/14328-2013"+mmdd+"-"+lec_num+"-1-h264-av1248-16x9-852x480.mp4  -o lab"+lec_num+".mp4"
    lec_num=lec_num.replace("S","")
    ffdict[int(lec_num)] = url
#print ffdict
In [33]:
for i in range(8,11):
    url=ffdict[i]
    try:
        print subprocess.Popen(url, shell=True, stdout=subprocess.PIPE).stdout.read()
    except:
        pass




The results of the downloads are displayed in the terminal window where you have opened the ipython notebook

Deep Learning Specialization on Coursera

learning cs109
Advertiser Disclosure: All Amazon links are affiliate links, which means I receive compensation for any purchases through them. You do not have to purchase via my links, but you support me if you do.

Comments