mlwhiz

Turning data into insights

Download CS109 Lectures using RTMPDump

Right Now I am working on CS109 from Harvard. It is a great course but its not easy to download, that is if you dont have this script. :)

PS: You will have to install rtmpdump in your machine for this to work

In [34]:
import requests
from pattern import web 
import subprocess
In [35]:
firsturl="http://cm.dce.harvard.edu/2014/01/14328/publicationListing.shtml"
r=requests.get(firsturl).text
In [36]:
dom=web.Element(r)
In [37]:
dom
Out[37]:
Element(tag=u'[document]')
In [38]:
#[elem.by_tag('li.list-type')[0].content for elem in dom.by_tag('ul.list-publication')]
mdict={'NOV':'11','OCT':'10','SEP':'09','DEC':'12'}
mon=[elem.content[:3] for elem in dom.by_tag("div.list-date  list-lecture")]
day=[elem.by_tag('span')[0].content for elem in dom.by_tag("div.list-date  list-lecture")]
datedict=zip( mon,day)
In [39]:
mmdd_dict=[str(mdict[elem[0]])+ str(elem[1])for elem in datedict]
In [40]:
dictlnames=['L23','L21','L22','L19','L20','L17','L18','L15','L16','L13','L14','L11','L12','L09','L10','L07','L08','L05','L06','L03','L04','L01','L02']
fdict=zip(dictlnames,mmdd_dict)
In [41]:
ffdict={}
for elem in fdict:
    lec_num,mmdd=elem
    lec_num=lec_num.replace("L","")
    url="rtmpdump -r rtmp://flash.dce.harvard.edu/bounce -C B:0 -C Z: -C S:/2014/01/14328/L"+str(lec_num)+"/14328-2013"+mmdd+"-L"+str(lec_num)+"-1-h264-av1248-16x9-852x480.mp4 -C S:BounceAPI3.0 -C N:0.000000 -C S:mp4  -y mp4:2014/01/14328/L"+str(lec_num)+"/14328-2013"+mmdd+"-L"+str(lec_num)+"-1-h264-av1248-16x9-852x480.mp4  -o lecture"+str(lec_num)+".mp4"
    if int(lec_num)>12:
        ffdict[int(lec_num)] = url

You should change range in the code below, if some of the lectures were downloaded and you dont want to script it all over again. I am downloading lectures from 13 to 23 as i have already downloaded the previous ones. The lectures are roughly 1 GB in length and if your internet connection is slow you might face a problem with break downloads.

In []:
for i in range(15,17):
    url=ffdict[i]
    print subprocess.Popen(url, shell=True, stdout=subprocess.PIPE).stdout.read()

Lets do it for labs:

In [31]:
#[elem.by_tag('li.list-type')[0].content for elem in dom.by_tag('ul.list-publication')]

mdict={'NOV':'11','OCT':'10','SEP':'09','DEC':'12'}
mon=[elem.content[:3] for elem in dom.by_tag("div.list-date  list-section")]
day=[elem.by_tag('span')[0].content for elem in dom.by_tag("div.list-date  list-section")]
datedict=zip( mon,day)
#print datedict
mmdd_dict=[str(mdict[elem[0]])+ str(elem[1])for elem in datedict]
#print mmdd_dict
dictlnames=['S10','S09','S08','S06','S05','S04','S03','S02','S01']
fdict=zip(dictlnames,mmdd_dict)
#print fdict
In [32]:
ffdict={}
for elem in fdict:
    lec_num,mmdd=elem
    #lec_num=lec_num.replace("S","")
    url="rtmpdump -r rtmp://flash.dce.harvard.edu/bounce -C B:0 -C Z: -C S:/2014/01/14328/"+lec_num+"/14328-2013"+mmdd+"-"+lec_num+"-1-h264-av1248-16x9-852x480.mp4 -C S:BounceAPI3.0 -C N:0.000000 -C S:mp4  -y mp4:2014/01/14328/"+lec_num+"/14328-2013"+mmdd+"-"+lec_num+"-1-h264-av1248-16x9-852x480.mp4  -o lab"+lec_num+".mp4"
    lec_num=lec_num.replace("S","")
    ffdict[int(lec_num)] = url
#print ffdict
In [33]:
for i in range(8,11):
    url=ffdict[i]
    try:
        print subprocess.Popen(url, shell=True, stdout=subprocess.PIPE).stdout.read()
    except:
        pass




The results of the downloads are displayed in the terminal window where you have opened the ipython notebook

Online data science courses to jumpstart your future.

learning cs109
Advertiser Disclosure: All Amazon links are affiliate links, which means I receive compensation for any purchases through them. You do not have to purchase via my links, but you support me if you do.

Comments