Crawler For LeetCode
/ / 点击 / 阅读耗时 4 分钟This a very simple tutorial for python crawler written with Requests and BeautifulSoup, the crawler is used to grab the leetcode subjects and your ac solution codes, better reading for fresh hands.
Motivation
For I have done almost 200 leetcode problems before, and the blog is just online, So it is a really tough job to transport all the code to the Blog, So i wanted to write some code such as crawler to do this job for me. Finally I choose python, Requests and BeautifulSoup to complete this code.
Thank for the help of Syaning
Preparation Works
Firstly, you should make sure that you have python.
the Requests is written by python and based on urllib but more convenient. Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.
Install by pip
1 | pip install requests |
or manually by
1 | git clone git://github.com/kennethreitz/requests.git |
or download the Beautiful Soup 4 source tarball and install it with setup.py
.
1 | python setup.py install |
Get the Problems List
To get all the problems list, the page site is https://leetcode.com/problemset/algorithms/, by chrome developer console, you can see the follwoing picture:
So we can by the following code to get the json data:
1 | #get all the algorithm list |
After get the problems list, we can using the for loop to get every problems' name, and then to get all the problems' websites.
1 |
|
we using the following code to get each problem's content:
1 | def get_alg_content(name): |
After all the work done, we can get the final folder contained with all the problems.
Get the full code ont my Github
Next Job
After getting all the problems content, the next job is to grab all my ac solution, it is a little hard work, i will post on the next blog.