python網路爬蟲教學-實戰篇(1) 蘋果日報馬網

星期一 11 Feb 2019   even  
資料分析 教學

哈囉~

歡迎來到我的python網頁爬蟲教學的實戰篇,這次我們將運用上次所學的技巧,直接進行一次正式的網路爬蟲。

前一陣子有個好朋友在爬這個網站,蘋果日報馬網 ,將各分頁的賽馬相關數據抓取下來,這是一個非常簡單適合練習爬蟲的網站,只要熟悉爬蟲的基本操作就可以爬這個網站囉!因此我就直接拿這個網頁作為這次教學的材料。

除了上一回介紹的requests和beautifulsoup4以外,這次會介紹"pandas"套件底下可以直接抓取網站表格的函式,read_html,本篇的剖析部分就完全靠這個。

(pandas是python資料分析最基本的套件,你可以去其他網站學習如何操作,或等我之後寫相關教學吧~)

本篇還使用大量的正規表達式進行文字剖析,處理雜亂的文字內容,如果你對於正規表達式完全沒概念的話,建議你先去了解一下。

https://regexone.com/

鎖定目標

獲取蘋果日報馬網所有分頁的賽馬資料

觀察網站結構

http://hk.racing.nextmedia.com/fullresult.php?date=20190123&page=01

他的網址很明顯地告訴你,他是用GET的方式傳遞資料給伺服器,然後決定要將那些資料呈現給頁面,一般在網址後面只要看到類似?VAR1=OOO&VAR2=OOOO,就是這種情況。在這裡的案例,他是由兩個變數date和page來決定伺服器要送那些資料給你,因此我們可以試試看,將date或page改成別的數字試試看,例如:

http://hk.racing.nextmedia.com/fullresult.php?date=20190123&page=3

使用你的瀏覽器,直接進入這個網址試試看。

咦!沒有任何東西! 這很正常,這兩變數需要是特定的數值,他的伺服器在後台才撈的到資料,呈現給你。 而這些數值清單,都在他的頁面上。 所以第一個要突破的關卡是如何獲得日期清單,以及這個日期內有多少分頁。只要獲得這些資訊,就能以迴圈的方式一一拜訪所有的頁面,再將目標資料抓取下來。

仔細觀看他的網站內容,可以發現有個下拉式選單可以點選各個不同的比賽日期,而一旦進入不同的日期之後,而每一頁又有所有頁碼可以點選進入,因此我們只要先進隨便一個頁面獲得所有日期的網址後,再以迴圈的方式 一一進入這些頁面,將每個日期有幾個不同的page給抓出來,就可以獲得所有含有我們目標資訊的網址了,下一步就是一個一個地剖析這接網頁,將我們的目標資訊抓下來。

先來檢查看看網頁的資料表示藏在哪個html語法下,通常這麼漂亮的格式應該都是在<table></table>底下。

果然是在<table></table>,事情變得非常簡單了。可以直接使用pandas套件底下的read_html,快速地獲取table底下的內容,只要再稍微整理一下,就可以很快的產出漂亮的資料表。

OK~ 觀察到這裡已經可以整理出爬取這個網站的策略了,分成兩個步驟,(1)獲得所有類似這樣子的網址 http://hk.racing.nextmedia.com/fullresult.php?date=XXXX&page=XX ,(2) 依序將資料抓取下來並整理乾淨。

獲取所有目標網址

先從隨意一個分頁試試看可不可以從超連結裡找到所有的目標網址:

import requests
from bs4 import BeautifulSoup

url = 'http://hk.racing.nextmedia.com/fullresult.php?date=20190123&page=01'

r = requests.get(url)
soup = BeautifulSoup(r.text, 'html.parser')

for link in soup.find_all('a'):
    print(link.get('href'))

index.php
racing_pda.php?sec_id=3403908&showcat=3472278
#
searchmain.php
index_01_a.php
#
nextrace.php
newallinfo.php
infomain_stable.php
infomain_jockey.php
infomain_his.php
weight.php
infomain_prepare.php
fullresult.php
results.php
#
thisrace_morning1.php
next_morning1.php
allmorning.php
master.php
stalls.php
stallstests.php
#
earlyodds.php
new_odds.php
singlet.php
http://hk.racing-cw.nextmedia.com/3pick1
http://hk.racing-cw.nextmedia.com/region
http://hk.racing-cw.nextmedia.com/trainer
#
j.php
s.php
timeform.php
index_11_a.php
jockeyq.php
stableq.php
jodds.php
sodds.php
dodds.php
index_3t_a.php
infomain_vary.php
catkeycol_toc.php
#
catkeycol.php?showcat=3472278&sec_id=3403908
catkeycol.php?sec_id=3403908&showcat=3472284
catkeycol.php?showcat=3488048&sec_id=3403908
catkeycol.php?showcat=3472279&sec_id=3403908
catkeycol.php?sec_id=3403908&showcat=3472283
catkeycol.php?showcat=3508172&sec_id=3403908
catkeycol.php?showcat=3507703&sec_id=3403908
catkeycol.php?sec_id=3403908&showcat=4287946
#
winning.php
winphoto.php
may.php
calbet.php
other9tips.php
/oversea
/dv_channel
/oversea/information/Conghua/HorseList.pdf
fullresult.php?date=20190123&page=01
fullresult.php?date=20190123&page=02
fullresult.php?date=20190123&page=03
fullresult.php?date=20190123&page=04
fullresult.php?date=20190123&page=05
fullresult.php?date=20190123&page=06
fullresult.php?date=20190123&page=07
fullresult.php?date=20190123&page=08
javascript:MM_openBrWindow('gd.php?date=20190123&page=1','_blank','width=600,height=365')
javascript:MM_openBrWindow('emodds.php?date=20190123&page=1','_blank','width=470,height=600,scrollbars=yes')
results.php?date=20190123
horse1.php?temp_horid=11854
jw.php?date=2018&sorj_id=516
sw.php?date=2018&sorj_id=330
horse1.php?temp_horid=10355
jw.php?date=2018&sorj_id=490
sw.php?date=2018&sorj_id=294
horse1.php?temp_horid=11850
jw.php?date=2018&sorj_id=451
sw.php?date=2018&sorj_id=314
horse1.php?temp_horid=11635
jw.php?date=2018&sorj_id=677
sw.php?date=2018&sorj_id=293
horse1.php?temp_horid=9988
jw.php?date=2018&sorj_id=564
sw.php?date=2018&sorj_id=211
horse1.php?temp_horid=11892
jw.php?date=2018&sorj_id=455
sw.php?date=2018&sorj_id=295
horse1.php?temp_horid=11879
jw.php?date=2018&sorj_id=464
sw.php?date=2018&sorj_id=264
horse1.php?temp_horid=11540
jw.php?date=2018&sorj_id=135
sw.php?date=2018&sorj_id=229
horse1.php?temp_horid=11804
jw.php?date=2018&sorj_id=583
sw.php?date=2018&sorj_id=430
horse1.php?temp_horid=11829
jw.php?date=2018&sorj_id=536
sw.php?date=2018&sorj_id=83
horse1.php?temp_horid=12219
jw.php?date=2018&sorj_id=446
sw.php?date=2018&sorj_id=331
index.php
http://hk.mark6.atnext.com/
art_main.php
index_01_a.php
searchmain.php
mailto:[email protected]

OK! 雖然有一些其他的網址,但我們可以很快地使用if去挑選出符合條件的網址,例如網址中一定要含有"fullresult.php?date=" 這樣的字串。

這裡面有各個分頁的超連結,但是各比賽日期的網址並不在這裡,我們必須用別的方式。

日期選項藏在選擇賽日這個下拉式選單裡,一般的下拉式選單的html程式碼為:

<select>
  <option value="test">test</option>
  <!-- and more options -->
</select>

所以我們可以試著使用option這個tag去抓看看:

soup.find_all('option')

[]

咦! 沒有東西!

那試試看select這個tag

soup.find_all('select')

[<select name="redirect" size="1"><script>document.write('<option value="#" selected>選擇
document.write('<option value="fullresult.php?date=20190210&page=01">2019-02-10</option>');
document.write('<option value="fullresult.php?date=20190207&page=01">2019-02-07</option>');
document.write('<option value="fullresult.php?date=20190202&page=01">2019-02-02</option>');
document.write('<option value="fullresult.php?date=20190130&page=01">2019-01-30</option>');
document.write('<option value="fullresult.php?date=20190127&page=01">2019-01-27</option>');
document.write('<option value="fullresult.php?date=20190123&page=01">2019-01-23</option>');
document.write('<option value="fullresult.php?date=20190120&page=01">2019-01-20</option>');
document.write('<option value="fullresult.php?date=20190116&page=01">2019-01-16</option>');
document.write('<option value="fullresult.php?date=20190112&page=01">2019-01-12</option>');
document.write('<option value="fullresult.php?date=20190109&page=01">2019-01-09</option>');
document.write('<option value="fullresult.php?date=20190106&page=01">2019-01-06</option>');
document.write('<option value="fullresult.php?date=20190101&page=01">2019-01-01</option>');
document.write('<option value="fullresult.php?date=20181229&page=01">2018-12-29</option>');
document.write('<option value="fullresult.php?date=20181226&page=01">2018-12-26</option>');
document.write('<option value="fullresult.php?date=20181223&page=01">2018-12-23</option>');
document.write('<option value="fullresult.php?date=20181219&page=01">2018-12-19</option>');
document.write('<option value="fullresult.php?date=20181216&page=01">2018-12-16</option>');
document.write('<option value="fullresult.php?date=20181212&page=01">2018-12-12</option>');
document.write('<option value="fullresult.php?date=20181209&page=01">2018-12-09</option>');
document.write('<option value="fullresult.php?date=20181205&page=01">2018-12-05</option>');
document.write('<option value="fullresult.php?date=20181202&page=01">2018-12-02</option>');
document.write('<option value="fullresult.php?date=20181128&page=01">2018-11-28</option>');
document.write('<option value="fullresult.php?date=20181125&page=01">2018-11-25</option>');
document.write('<option value="fullresult.php?date=20181121&page=01">2018-11-21</option>');
document.write('<option value="fullresult.php?date=20181118&page=01">2018-11-18</option>');
document.write('<option value="fullresult.php?date=20181114&page=01">2018-11-14</option>');
document.write('<option value="fullresult.php?date=20181110&page=01">2018-11-10</option>');
document.write('<option value="fullresult.php?date=20181107&page=01">2018-11-07</option>');
document.write('<option value="fullresult.php?date=20181104&page=01">2018-11-04</option>');
document.write('<option value="fullresult.php?date=20181031&page=01">2018-10-31</option>');
document.write('<option value="fullresult.php?date=20181028&page=01">2018-10-28</option>');
document.write('<option value="fullresult.php?date=20181024&page=01">2018-10-24</option>');
document.write('<option value="fullresult.php?date=20181021&page=01">2018-10-21</option>');
document.write('<option value="fullresult.php?date=20181018&page=01">2018-10-18</option>');
document.write('<option value="fullresult.php?date=20181013&page=01">2018-10-13</option>');
document.write('<option value="fullresult.php?date=20181010&page=01">2018-10-10</option>');
document.write('<option value="fullresult.php?date=20181008&page=01">2018-10-08</option>');
document.write('<option value="fullresult.php?date=20181007&page=01">2018-10-07</option>');
document.write('<option value="fullresult.php?date=20181003&page=01">2018-10-03</option>');
document.write('<option value="fullresult.php?date=20181001&page=01">2018-10-01</option>');
document.write('<option value="fullresult.php?date=20180926&page=01">2018-09-26</option>');
document.write('<option value="fullresult.php?date=20180922&page=01">2018-09-22</option>');
document.write('<option value="fullresult.php?date=20180916&page=01">2018-09-16</option>');
document.write('<option value="fullresult.php?date=20180912&page=01">2018-09-12</option>');
document.write('<option value="fullresult.php?date=20180909&page=01">2018-09-09</option>');
document.write('<option value="fullresult.php?date=20180905&page=01">2018-09-05</option>');
document.write('<option value="fullresult.php?date=20180902&page=01">2018-09-02</option>');
document.write('<option value="fullresult.php?date=20180715&page=01">2018-07-15</option>');
document.write('<option value="fullresult.php?date=20180711&page=01">2018-07-11</option>');
document.write('<option value="fullresult.php?date=20180708&page=01">2018-07-08</option>');
document.write('<option value="fullresult.php?date=20180704&page=01">2018-07-04</option>');
document.write('<option value="fullresult.php?date=20180701&page=01">2018-07-01</option>');
document.write('<option value="fullresult.php?date=20180627&page=01">2018-06-27</option>');
document.write('<option value="fullresult.php?date=20180624&page=01">2018-06-24</option>');
document.write('<option value="fullresult.php?date=20180623&page=01">2018-06-23</option>');
document.write('<option value="fullresult.php?date=20180616&page=01">2018-06-16</option>');
document.write('<option value="fullresult.php?date=20180613&page=01">2018-06-13</option>');
document.write('<option value="fullresult.php?date=20180610&page=01">2018-06-10</option>');
document.write('<option value="fullresult.php?date=20180606&page=01">2018-06-06</option>');
document.write('<option value="fullresult.php?date=20180603&page=01">2018-06-03</option>');
document.write('<option value="fullresult.php?date=20180530&page=01">2018-05-30</option>');
document.write('<option value="fullresult.php?date=20180523&page=01">2018-05-23</option>');
document.write('<option value="fullresult.php?date=20180520&page=01">2018-05-20</option>');
document.write('<option value="fullresult.php?date=20180516&page=01">2018-05-16</option>');
document.write('<option value="fullresult.php?date=20180512&page=01">2018-05-12</option>');
document.write('<option value="fullresult.php?date=20180509&page=01">2018-05-09</option>');
document.write('<option value="fullresult.php?date=20180506&page=01">2018-05-06</option>');
document.write('<option value="fullresult.php?date=20180502&page=01">2018-05-02</option>');
document.write('<option value="fullresult.php?date=20180429&page=01">2018-04-29</option>');
document.write('<option value="fullresult.php?date=20180425&page=01">2018-04-25</option>');
document.write('<option value="fullresult.php?date=20180421&page=01">2018-04-21</option>');
document.write('<option value="fullresult.php?date=20180418&page=01">2018-04-18</option>');
document.write('<option value="fullresult.php?date=20180415&page=01">2018-04-15</option>');
document.write('<option value="fullresult.php?date=20180411&page=01">2018-04-11</option>');
document.write('<option value="fullresult.php?date=20180408&page=01">2018-04-08</option>');
document.write('<option value="fullresult.php?date=20180402&page=01">2018-04-02</option>');
document.write('<option value="fullresult.php?date=20180328&page=01">2018-03-28</option>');
document.write('<option value="fullresult.php?date=20180325&page=01">2018-03-25</option>');
document.write('<option value="fullresult.php?date=20180321&page=01">2018-03-21</option>');
document.write('<option value="fullresult.php?date=20180318&page=01">2018-03-18</option>');
document.write('<option value="fullresult.php?date=20180314&page=01">2018-03-14</option>');
document.write('<option value="fullresult.php?date=20180311&page=01">2018-03-11</option>');
document.write('<option value="fullresult.php?date=20180307&page=01">2018-03-07</option>');
document.write('<option value="fullresult.php?date=20180303&page=01">2018-03-03</option>');
document.write('<option value="fullresult.php?date=20180228&page=01">2018-02-28</option>');
document.write('<option value="fullresult.php?date=20180225&page=01">2018-02-25</option>');
document.write('<option value="fullresult.php?date=20180221&page=01">2018-02-21</option>');
document.write('<option value="fullresult.php?date=20180218&page=01">2018-02-18</option>');
document.write('<option value="fullresult.php?date=20180214&page=01">2018-02-14</option>');
document.write('<option value="fullresult.php?date=20180210&page=01">2018-02-10</option>');
document.write('<option value="fullresult.php?date=20180207&page=01">2018-02-07</option>');
document.write('<option value="fullresult.php?date=20180204&page=01">2018-02-04</option>');
document.write('<option value="fullresult.php?date=20180131&page=01">2018-01-31</option>');
document.write('<option value="fullresult.php?date=20180128&page=01">2018-01-28</option>');
document.write('<option value="fullresult.php?date=20180124&page=01">2018-01-24</option>');
document.write('<option value="fullresult.php?date=20180121&page=01">2018-01-21</option>');
document.write('<option value="fullresult.php?date=20180117&page=01">2018-01-17</option>');
document.write('<option value="fullresult.php?date=20180113&page=01">2018-01-13</option>');
document.write('<option value="fullresult.php?date=20180110&page=01">2018-01-10</option>');
document.write('<option value="fullresult.php?date=20180107&page=01">2018-01-07</option>');
document.write('<option value="fullresult.php?date=20180101&page=01">2018-01-01</option>');
document.write('<option value="fullresult.php?date=20171227&page=01">2017-12-27</option>');
document.write('<option value="fullresult.php?date=20171223&page=01">2017-12-23</option>');
document.write('<option value="fullresult.php?date=20171220&page=01">2017-12-20</option>');
document.write('<option value="fullresult.php?date=20171217&page=01">2017-12-17</option>');
document.write('<option value="fullresult.php?date=20171213&page=01">2017-12-13</option>');
document.write('<option value="fullresult.php?date=20171210&page=01">2017-12-10</option>');
document.write('<option value="fullresult.php?date=20171206&page=01">2017-12-06</option>');
document.write('<option value="fullresult.php?date=20171203&page=01">2017-12-03</option>');
document.write('<option value="fullresult.php?date=20171129&page=01">2017-11-29</option>');
document.write('<option value="fullresult.php?date=20171126&page=01">2017-11-26</option>');
document.write('<option value="fullresult.php?date=20171122&page=01">2017-11-22</option>');
document.write('<option value="fullresult.php?date=20171119&page=01">2017-11-19</option>');
document.write('<option value="fullresult.php?date=20171115&page=01">2017-11-15</option>');
document.write('<option value="fullresult.php?date=20171112&page=01">2017-11-12</option>');
document.write('<option value="fullresult.php?date=20171111&page=01">2017-11-11</option>');
document.write('<option value="fullresult.php?date=20171108&page=01">2017-11-08</option>');
document.write('<option value="fullresult.php?date=20171105&page=01">2017-11-05</option>');
document.write('<option value="fullresult.php?date=20171101&page=01">2017-11-01</option>');
document.write('<option value="fullresult.php?date=20171029&page=01">2017-10-29</option>');
document.write('<option value="fullresult.php?date=20171025&page=01">2017-10-25</option>');
document.write('<option value="fullresult.php?date=20171022&page=01">2017-10-22</option>');
document.write('<option value="fullresult.php?date=20171018&page=01">2017-10-18</option>');
document.write('<option value="fullresult.php?date=20171014&page=01">2017-10-14</option>');
document.write('<option value="fullresult.php?date=20171011&page=01">2017-10-11</option>');
document.write('<option value="fullresult.php?date=20171008&page=01">2017-10-08</option>');
document.write('<option value="fullresult.php?date=20171005&page=01">2017-10-05</option>');
document.write('<option value="fullresult.php?date=20171001&page=01">2017-10-01</option>');
document.write('<option value="fullresult.php?date=20170927&page=01">2017-09-27</option>');
document.write('<option value="fullresult.php?date=20170924&page=01">2017-09-24</option>');
document.write('<option value="fullresult.php?date=20170920&page=01">2017-09-20</option>');
document.write('<option value="fullresult.php?date=20170916&page=01">2017-09-16</option>');
document.write('<option value="fullresult.php?date=20170913&page=01">2017-09-13</option>');
document.write('<option value="fullresult.php?date=20170910&page=01">2017-09-10</option>');
document.write('<option value="fullresult.php?date=20170906&page=01">2017-09-06</option>');
document.write('<option value="fullresult.php?date=20170903&page=01">2017-09-03</option>');
document.write('<option value="fullresult.php?date=20170716&page=01">2017-07-16</option>');
document.write('<option value="fullresult.php?date=20170712&page=01">2017-07-12</option>');
document.write('<option value="fullresult.php?date=20170709&page=01">2017-07-09</option>');
document.write('<option value="fullresult.php?date=20170701&page=01">2017-07-01</option>');
document.write('<option value="fullresult.php?date=20170628&page=01">2017-06-28</option>');
document.write('<option value="fullresult.php?date=20170625&page=01">2017-06-25</option>');
document.write('<option value="fullresult.php?date=20170621&page=01">2017-06-21</option>');
document.write('<option value="fullresult.php?date=20170618&page=01">2017-06-18</option>');
document.write('<option value="fullresult.php?date=20170614&page=01">2017-06-14</option>');
document.write('<option value="fullresult.php?date=20170611&page=01">2017-06-11</option>');
document.write('<option value="fullresult.php?date=20170607&page=01">2017-06-07</option>');
document.write('<option value="fullresult.php?date=20170604&page=01">2017-06-04</option>');
document.write('<option value="fullresult.php?date=20170531&page=01">2017-05-31</option>');
document.write('<option value="fullresult.php?date=20170528&page=01">2017-05-28</option>');
document.write('<option value="fullresult.php?date=20170524&page=01">2017-05-24</option>');
document.write('<option value="fullresult.php?date=20170521&page=01">2017-05-21</option>');
document.write('<option value="fullresult.php?date=20170520&page=01">2017-05-20</option>');
document.write('<option value="fullresult.php?date=20170517&page=01">2017-05-17</option>');
document.write('<option value="fullresult.php?date=20170513&page=01">2017-05-13</option>');
document.write('<option value="fullresult.php?date=20170510&page=01">2017-05-10</option>');
document.write('<option value="fullresult.php?date=20170507&page=01">2017-05-07</option>');
document.write('<option value="fullresult.php?date=20170503&page=01">2017-05-03</option>');
document.write('<option value="fullresult.php?date=20170430&page=01">2017-04-30</option>');
document.write('<option value="fullresult.php?date=20170426&page=01">2017-04-26</option>');
document.write('<option value="fullresult.php?date=20170423&page=01">2017-04-23</option>');
document.write('<option value="fullresult.php?date=20170420&page=01">2017-04-20</option>');
document.write('<option value="fullresult.php?date=20170417&page=01">2017-04-17</option>');
document.write('<option value="fullresult.php?date=20170412&page=01">2017-04-12</option>');
document.write('<option value="fullresult.php?date=20170409&page=01">2017-04-09</option>');
document.write('<option value="fullresult.php?date=20170405&page=01">2017-04-05</option>');
document.write('<option value="fullresult.php?date=20170402&page=01">2017-04-02</option>');
document.write('<option value="fullresult.php?date=20170329&page=01">2017-03-29</option>');
document.write('<option value="fullresult.php?date=20170326&page=01">2017-03-26</option>');
document.write('<option value="fullresult.php?date=20170322&page=01">2017-03-22</option>');
document.write('<option value="fullresult.php?date=20170319&page=01">2017-03-19</option>');
document.write('<option value="fullresult.php?date=20170318&page=01">2017-03-18</option>');
document.write('<option value="fullresult.php?date=20170315&page=01">2017-03-15</option>');
document.write('<option value="fullresult.php?date=20170312&page=01">2017-03-12</option>');
document.write('<option value="fullresult.php?date=20170308&page=01">2017-03-08</option>');
document.write('<option value="fullresult.php?date=20170305&page=01">2017-03-05</option>');
document.write('<option value="fullresult.php?date=20170301&page=01">2017-03-01</option>');
document.write('<option value="fullresult.php?date=20170226&page=01">2017-02-26</option>');
document.write('<option value="fullresult.php?date=20170222&page=01">2017-02-22</option>');
document.write('<option value="fullresult.php?date=20170219&page=01">2017-02-19</option>');
document.write('<option value="fullresult.php?date=20170215&page=01">2017-02-15</option>');
document.write('<option value="fullresult.php?date=20170211&page=01">2017-02-11</option>');
document.write('<option value="fullresult.php?date=20170208&page=01">2017-02-08</option>');
document.write('<option value="fullresult.php?date=20170205&page=01">2017-02-05</option>');
document.write('<option value="fullresult.php?date=20170202&page=01">2017-02-02</option>');
document.write('<option value="fullresult.php?date=20170130&page=01">2017-01-30</option>');
document.write('<option value="fullresult.php?date=20170125&page=01">2017-01-25</option>');
document.write('<option value="fullresult.php?date=20170122&page=01">2017-01-22</option>');
document.write('<option value="fullresult.php?date=20170118&page=01">2017-01-18</option>');
document.write('<option value="fullresult.php?date=20170114&page=01">2017-01-14</option>');
document.write('<option value="fullresult.php?date=20170111&page=01">2017-01-11</option>');
document.write('<option value="fullresult.php?date=20170108&page=01">2017-01-08</option>');
document.write('<option value="fullresult.php?date=20170104&page=01">2017-01-04</option>');
document.write('<option value="fullresult.php?date=20170101&page=01">2017-01-01</option>');
document.write('<option value="fullresult.php?date=20161227&page=01">2016-12-27</option>');
document.write('<option value="fullresult.php?date=20161222&page=01">2016-12-22</option>');
document.write('<option value="fullresult.php?date=20161217&page=01">2016-12-17</option>');
document.write('<option value="fullresult.php?date=20161214&page=01">2016-12-14</option>');
document.write('<option value="fullresult.php?date=20161211&page=01">2016-12-11</option>');
document.write('<option value="fullresult.php?date=20161207&page=01">2016-12-07</option>');
document.write('<option value="fullresult.php?date=20161204&page=01">2016-12-04</option>');
document.write('<option value="fullresult.php?date=20161130&page=01">2016-11-30</option>');
document.write('<option value="fullresult.php?date=20161127&page=01">2016-11-27</option>');
document.write('<option value="fullresult.php?date=20161123&page=01">2016-11-23</option>');
document.write('<option value="fullresult.php?date=20161120&page=01">2016-11-20</option>');
document.write('<option value="fullresult.php?date=20161118&page=01">2016-11-18</option>');
document.write('<option value="fullresult.php?date=20161116&page=01">2016-11-16</option>');
document.write('<option value="fullresult.php?date=20161112&page=01">2016-11-12</option>');
document.write('<option value="fullresult.php?date=20161109&page=01">2016-11-09</option>');
document.write('<option value="fullresult.php?date=20161106&page=01">2016-11-06</option>');
document.write('<option value="fullresult.php?date=20161105&page=01">2016-11-05</option>');
document.write('<option value="fullresult.php?date=20161102&page=01">2016-11-02</option>');
document.write('<option value="fullresult.php?date=20161101&page=01">2016-11-01</option>');
document.write('<option value="fullresult.php?date=20161030&page=01">2016-10-30</option>');
document.write('<option value="fullresult.php?date=20161029&page=01">2016-10-29</option>');
document.write('<option value="fullresult.php?date=20161026&page=01">2016-10-26</option>');
document.write('<option value="fullresult.php?date=20161023&page=01">2016-10-23</option>');
document.write('<option value="fullresult.php?date=20161022&page=01">2016-10-22</option>');
document.write('<option value="fullresult.php?date=20161019&page=01">2016-10-19</option>');
document.write('<option value="fullresult.php?date=20161016&page=01">2016-10-16</option>');
document.write('<option value="fullresult.php?date=20161015&page=01">2016-10-15</option>');
document.write('<option value="fullresult.php?date=20161012&page=01">2016-10-12</option>');
document.write('<option value="fullresult.php?date=20161008&page=01">2016-10-08</option>');
document.write('<option value="fullresult.php?date=20161005&page=01">2016-10-05</option>');
document.write('<option value="fullresult.php?date=20161001&page=01">2016-10-01</option>');
document.write('<option value="fullresult.php?date=20160928&page=01">2016-09-28</option>');
document.write('<option value="fullresult.php?date=20160925&page=01">2016-09-25</option>');
document.write('<option value="fullresult.php?date=20160921&page=01">2016-09-21</option>');
document.write('<option value="fullresult.php?date=20160918&page=01">2016-09-18</option>');
document.write('<option value="fullresult.php?date=20160911&page=01">2016-09-11</option>');
document.write('<option value="fullresult.php?date=20160907&page=01">2016-09-07</option>');
document.write('<option value="fullresult.php?date=20160903&page=01">2016-09-03</option>');
document.write('<option value="fullresult.php?date=20160710&page=01">2016-07-10</option>');
document.write('<option value="fullresult.php?date=20160706&page=01">2016-07-06</option>');
document.write('<option value="fullresult.php?date=20160701&page=01">2016-07-01</option>');
document.write('<option value="fullresult.php?date=20160626&page=01">2016-06-26</option>');
document.write('<option value="fullresult.php?date=20160622&page=01">2016-06-22</option>');
document.write('<option value="fullresult.php?date=20160619&page=01">2016-06-19</option>');
document.write('<option value="fullresult.php?date=20160616&page=01">2016-06-16</option>');
document.write('<option value="fullresult.php?date=20160615&page=01">2016-06-15</option>');
document.write('<option value="fullresult.php?date=20160612&page=01">2016-06-12</option>');
document.write('<option value="fullresult.php?date=20160609&page=01">2016-06-09</option>');
document.write('<option value="fullresult.php?date=20160605&page=01">2016-06-05</option>');
document.write('<option value="fullresult.php?date=20160601&page=01">2016-06-01</option>');
document.write('<option value="fullresult.php?date=20160529&page=01">2016-05-29</option>');
document.write('<option value="fullresult.php?date=20160522&page=01">2016-05-22</option>');
document.write('<option value="fullresult.php?date=20160518&page=01">2016-05-18</option>');
document.write('<option value="fullresult.php?date=20160514&page=01">2016-05-14</option>');
document.write('<option value="fullresult.php?date=20160511&page=01">2016-05-11</option>');
document.write('<option value="fullresult.php?date=20160507&page=01">2016-05-07</option>');
document.write('<option value="fullresult.php?date=20160504&page=01">2016-05-04</option>');
document.write('<option value="fullresult.php?date=20160501&page=01">2016-05-01</option>');
document.write('<option value="fullresult.php?date=20160427&page=01">2016-04-27</option>');
document.write('<option value="fullresult.php?date=20160424&page=01">2016-04-24</option>');
document.write('<option value="fullresult.php?date=20160420&page=01">2016-04-20</option>');
document.write('<option value="fullresult.php?date=20160416&page=01">2016-04-16</option>');
document.write('<option value="fullresult.php?date=20160413&page=01">2016-04-13</option>');
document.write('<option value="fullresult.php?date=20160410&page=01">2016-04-10</option>');
document.write('<option value="fullresult.php?date=20160406&page=01">2016-04-06</option>');
document.write('<option value="fullresult.php?date=20160403&page=01">2016-04-03</option>');
document.write('<option value="fullresult.php?date=20160331&page=01">2016-03-31</option>');
document.write('<option value="fullresult.php?date=20160328&page=01">2016-03-28</option>');
document.write('<option value="fullresult.php?date=20160323&page=01">2016-03-23</option>');
document.write('<option value="fullresult.php?date=20160320&page=01">2016-03-20</option>');
document.write('<option value="fullresult.php?date=20160316&page=01">2016-03-16</option>');
document.write('<option value="fullresult.php?date=20160313&page=01">2016-03-13</option>');
document.write('<option value="fullresult.php?date=20160309&page=01">2016-03-09</option>');
document.write('<option value="fullresult.php?date=20160306&page=01">2016-03-06</option>');
document.write('<option value="fullresult.php?date=20160302&page=01">2016-03-02</option>');
document.write('<option value="fullresult.php?date=20160228&page=01">2016-02-28</option>');
document.write('<option value="fullresult.php?date=20160224&page=01">2016-02-24</option>');
document.write('<option value="fullresult.php?date=20160221&page=01">2016-02-21</option>');
document.write('<option value="fullresult.php?date=20160217&page=01">2016-02-17</option>');
document.write('<option value="fullresult.php?date=20160214&page=01">2016-02-14</option>');
document.write('<option value="fullresult.php?date=20160210&page=01">2016-02-10</option>');
document.write('<option value="fullresult.php?date=20160206&page=01">2016-02-06</option>');
document.write('<option value="fullresult.php?date=20160203&page=01">2016-02-03</option>');
document.write('<option value="fullresult.php?date=20160131&page=01">2016-01-31</option>');
document.write('<option value="fullresult.php?date=20160124&page=01">2016-01-24</option>');
document.write('<option value="fullresult.php?date=20160120&page=01">2016-01-20</option>');
document.write('<option value="fullresult.php?date=20160117&page=01">2016-01-17</option>');
document.write('<option value="fullresult.php?date=20160113&page=01">2016-01-13</option>');
document.write('<option value="fullresult.php?date=20160109&page=01">2016-01-09</option>');
document.write('<option value="fullresult.php?date=20160106&page=01">2016-01-06</option>');
document.write('<option value="fullresult.php?date=20160101&page=01">2016-01-01</option>');
document.write('<option value="fullresult.php?date=20151227&page=01">2015-12-27</option>');
document.write('<option value="fullresult.php?date=20151223&page=01">2015-12-23</option>');
document.write('<option value="fullresult.php?date=20151219&page=01">2015-12-19</option>');
document.write('<option value="fullresult.php?date=20151216&page=01">2015-12-16</option>');
document.write('<option value="fullresult.php?date=20151213&page=01">2015-12-13</option>');
document.write('<option value="fullresult.php?date=20151209&page=01">2015-12-09</option>');
document.write('<option value="fullresult.php?date=20151206&page=01">2015-12-06</option>');
document.write('<option value="fullresult.php?date=20151202&page=01">2015-12-02</option>');
document.write('<option value="fullresult.php?date=20151129&page=01">2015-11-29</option>');
document.write('<option value="fullresult.php?date=20151125&page=01">2015-11-25</option>');
document.write('<option value="fullresult.php?date=20151121&page=01">2015-11-21</option>');
document.write('<option value="fullresult.php?date=20151118&page=01">2015-11-18</option>');
document.write('<option value="fullresult.php?date=20151114&page=01">2015-11-14</option>');
document.write('<option value="fullresult.php?date=20151111&page=01">2015-11-11</option>');
document.write('<option value="fullresult.php?date=20151108&page=01">2015-11-08</option>');
document.write('<option value="fullresult.php?date=20151101&page=01">2015-11-01</option>');
document.write('<option value="fullresult.php?date=20151025&page=01">2015-10-25</option>');
document.write('<option value="fullresult.php?date=20151022&page=01">2015-10-22</option>');
document.write('<option value="fullresult.php?date=20151018&page=01">2015-10-18</option>');
document.write('<option value="fullresult.php?date=20151014&page=01">2015-10-14</option>');
document.write('<option value="fullresult.php?date=20151010&page=01">2015-10-10</option>');
document.write('<option value="fullresult.php?date=20151007&page=01">2015-10-07</option>');
document.write('<option value="fullresult.php?date=20151004&page=01">2015-10-04</option>');
document.write('<option value="fullresult.php?date=20151001&page=01">2015-10-01</option>');
document.write('<option value="fullresult.php?date=20150928&page=01">2015-09-28</option>');
document.write('<option value="fullresult.php?date=20150923&page=01">2015-09-23</option>');
document.write('<option value="fullresult.php?date=20150919&page=01">2015-09-19</option>');
document.write('<option value="fullresult.php?date=20150916&page=01">2015-09-16</option>');
document.write('<option value="fullresult.php?date=20150913&page=01">2015-09-13</option>');
document.write('<option value="fullresult.php?date=20150909&page=01">2015-09-09</option>');
document.write('<option value="fullresult.php?date=20150906&page=01">2015-09-06</option>');
document.write('<option value="fullresult.php?date=20150712&page=01">2015-07-12</option>');
document.write('<option value="fullresult.php?date=20150708&page=01">2015-07-08</option>');
document.write('<option value="fullresult.php?date=20150705&page=01">2015-07-05</option>');
document.write('<option value="fullresult.php?date=20150701&page=01">2015-07-01</option>');
document.write('<option value="fullresult.php?date=20150627&page=01">2015-06-27</option>');
document.write('<option value="fullresult.php?date=20150624&page=01">2015-06-24</option>');
document.write('<option value="fullresult.php?date=20150621&page=01">2015-06-21</option>');
document.write('<option value="fullresult.php?date=20150617&page=01">2015-06-17</option>');
document.write('<option value="fullresult.php?date=20150614&page=01">2015-06-14</option>');
document.write('<option value="fullresult.php?date=20150610&page=01">2015-06-10</option>');
document.write('<option value="fullresult.php?date=20150607&page=01">2015-06-07</option>');
document.write('<option value="fullresult.php?date=20150603&page=01">2015-06-03</option>');
document.write('<option value="fullresult.php?date=20150531&page=01">2015-05-31</option>');
document.write('<option value="fullresult.php?date=20150527&page=01">2015-05-27</option>');
document.write('<option value="fullresult.php?date=20150524&page=01">2015-05-24</option>');
document.write('<option value="fullresult.php?date=20150520&page=01">2015-05-20</option>');
document.write('<option value="fullresult.php?date=20150516&page=01">2015-05-16</option>');
document.write('<option value="fullresult.php?date=20150513&page=01">2015-05-13</option>');
document.write('<option value="fullresult.php?date=20150509&page=01">2015-05-09</option>');
document.write('<option value="fullresult.php?date=20150506&page=01">2015-05-06</option>');
document.write('<option value="fullresult.php?date=20150503&page=01">2015-05-03</option>');
document.write('<option value="fullresult.php?date=20150429&page=01">2015-04-29</option>');
document.write('<option value="fullresult.php?date=20150426&page=01">2015-04-26</option>');
document.write('<option value="fullresult.php?date=20150422&page=01">2015-04-22</option>');
document.write('<option value="fullresult.php?date=20150419&page=01">2015-04-19</option>');
document.write('<option value="fullresult.php?date=20150415&page=01">2015-04-15</option>');
document.write('<option value="fullresult.php?date=20150412&page=01">2015-04-12</option>');
document.write('<option value="fullresult.php?date=20150407&page=01">2015-04-07</option>');
document.write('<option value="fullresult.php?date=20150401&page=01">2015-04-01</option>');
document.write('<option value="fullresult.php?date=20150329&page=01">2015-03-29</option>');
document.write('<option value="fullresult.php?date=20150325&page=01">2015-03-25</option>');
document.write('<option value="fullresult.php?date=20150321&page=01">2015-03-21</option>');
document.write('<option value="fullresult.php?date=20150318&page=01">2015-03-18</option>');
document.write('<option value="fullresult.php?date=20150315&page=01">2015-03-15</option>');
document.write('<option value="fullresult.php?date=20150311&page=01">2015-03-11</option>');
document.write('<option value="fullresult.php?date=20150308&page=01">2015-03-08</option>');
document.write('<option value="fullresult.php?date=20150304&page=01">2015-03-04</option>');
document.write('<option value="fullresult.php?date=20150301&page=01">2015-03-01</option>');
document.write('<option value="fullresult.php?date=20150225&page=01">2015-02-25</option>');
document.write('<option value="fullresult.php?date=20150221&page=01">2015-02-21</option>');
document.write('<option value="fullresult.php?date=20150215&page=01">2015-02-15</option>');
document.write('<option value="fullresult.php?date=20150211&page=01">2015-02-11</option>');
document.write('<option value="fullresult.php?date=20150207&page=01">2015-02-07</option>');
document.write('<option value="fullresult.php?date=20150204&page=01">2015-02-04</option>');
document.write('<option value="fullresult.php?date=20150201&page=01">2015-02-01</option>');
document.write('<option value="fullresult.php?date=20150128&page=01">2015-01-28</option>');
document.write('<option value="fullresult.php?date=20150125&page=01">2015-01-25</option>');
document.write('<option value="fullresult.php?date=20150121&page=01">2015-01-21</option>');
document.write('<option value="fullresult.php?date=20150118&page=01">2015-01-18</option>');
document.write('<option value="fullresult.php?date=20150114&page=01">2015-01-14</option>');
document.write('<option value="fullresult.php?date=20150110&page=01">2015-01-10</option>');
document.write('<option value="fullresult.php?date=20150107&page=01">2015-01-07</option>');
document.write('<option value="fullresult.php?date=20150104&page=01">2015-01-04</option>');
document.write('<option value="fullresult.php?date=20150101&page=01">2015-01-01</option>');
document.write('<option value="fullresult.php?date=20141228&page=01">2014-12-28</option>');
document.write('<option value="fullresult.php?date=20141220&page=01">2014-12-20</option>');
document.write('<option value="fullresult.php?date=20141217&page=01">2014-12-17</option>');
document.write('<option value="fullresult.php?date=20141214&page=01">2014-12-14</option>');
document.write('<option value="fullresult.php?date=20141210&page=01">2014-12-10</option>');
document.write('<option value="fullresult.php?date=20141207&page=01">2014-12-07</option>');
document.write('<option value="fullresult.php?date=20141203&page=01">2014-12-03</option>');
document.write('<option value="fullresult.php?date=20141130&page=01">2014-11-30</option>');
document.write('<option value="fullresult.php?date=20141126&page=01">2014-11-26</option>');
document.write('<option value="fullresult.php?date=20141123&page=01">2014-11-23</option>');
document.write('<option value="fullresult.php?date=20141119&page=01">2014-11-19</option>');
document.write('<option value="fullresult.php?date=20141115&page=01">2014-11-15</option>');
document.write('<option value="fullresult.php?date=20141112&page=01">2014-11-12</option>');
document.write('<option value="fullresult.php?date=20141109&page=01">2014-11-09</option>');
document.write('<option value="fullresult.php?date=20141102&page=01">2014-11-02</option>');
document.write('<option value="fullresult.php?date=20141029&page=01">2014-10-29</option>');
document.write('<option value="fullresult.php?date=20141026&page=01">2014-10-26</option>');
document.write('<option value="fullresult.php?date=20141022&page=01">2014-10-22</option>');
document.write('<option value="fullresult.php?date=20141019&page=01">2014-10-19</option>');
document.write('<option value="fullresult.php?date=20141015&page=01">2014-10-15</option>');
document.write('<option value="fullresult.php?date=20141012&page=01">2014-10-12</option>');
document.write('<option value="fullresult.php?date=20141008&page=01">2014-10-08</option>');
document.write('<option value="fullresult.php?date=20141005&page=01">2014-10-05</option>');
document.write('<option value="fullresult.php?date=20141001&page=01">2014-10-01</option>');
document.write('<option value="fullresult.php?date=20140927&page=01">2014-09-27</option>');
document.write('<option value="fullresult.php?date=20140924&page=01">2014-09-24</option>');
document.write('<option value="fullresult.php?date=20140921&page=01">2014-09-21</option>');
document.write('<option value="fullresult.php?date=20140917&page=01">2014-09-17</option>');
document.write('<option value="fullresult.php?date=20140914&page=01">2014-09-14</option>');
document.write('<option value="fullresult.php?date=20140706&page=01">2014-07-06</option>');
document.write('<option value="fullresult.php?date=20140701&page=01">2014-07-01</option>');
document.write('<option value="fullresult.php?date=20140628&page=01">2014-06-28</option>');
document.write('<option value="fullresult.php?date=20140625&page=01">2014-06-25</option>');
document.write('<option value="fullresult.php?date=20140622&page=01">2014-06-22</option>');
document.write('<option value="fullresult.php?date=20140618&page=01">2014-06-18</option>');
document.write('<option value="fullresult.php?date=20140615&page=01">2014-06-15</option>');
document.write('<option value="fullresult.php?date=20140611&page=01">2014-06-11</option>');
document.write('<option value="fullresult.php?date=20140608&page=01">2014-06-08</option>');
document.write('<option value="fullresult.php?date=20140605&page=01">2014-06-05</option>');
document.write('<option value="fullresult.php?date=20140601&page=01">2014-06-01</option>');
document.write('<option value="fullresult.php?date=20140528&page=01">2014-05-28</option>');
document.write('<option value="fullresult.php?date=20140525&page=01">2014-05-25</option>');
document.write('<option value="fullresult.php?date=20140521&page=01">2014-05-21</option>');
document.write('<option value="fullresult.php?date=20140518&page=01">2014-05-18</option>');
document.write('<option value="fullresult.php?date=20140517&page=01">2014-05-17</option>');
document.write('<option value="fullresult.php?date=20140514&page=01">2014-05-14</option>');
document.write('<option value="fullresult.php?date=20140510&page=01">2014-05-10</option>');
document.write('<option value="fullresult.php?date=20140507&page=01">2014-05-07</option>');
document.write('<option value="fullresult.php?date=20140504&page=01">2014-05-04</option>');
document.write('<option value="fullresult.php?date=20140430&page=01">2014-04-30</option>');
document.write('<option value="fullresult.php?date=20140427&page=01">2014-04-27</option>');
document.write('<option value="fullresult.php?date=20140421&page=01">2014-04-21</option>');
document.write('<option value="fullresult.php?date=20140416&page=01">2014-04-16</option>');
document.write('<option value="fullresult.php?date=20140413&page=01">2014-04-13</option>');
document.write('<option value="fullresult.php?date=20140409&page=01">2014-04-09</option>');
document.write('<option value="fullresult.php?date=20140406&page=01">2014-04-06</option>');
document.write('<option value="fullresult.php?date=20140402&page=01">2014-04-02</option>');
document.write('<option value="fullresult.php?date=20140330&page=01">2014-03-30</option>');
document.write('<option value="fullresult.php?date=20140326&page=01">2014-03-26</option>');
document.write('<option value="fullresult.php?date=20140323&page=01">2014-03-23</option>');
document.write('<option value="fullresult.php?date=20140319&page=01">2014-03-19</option>');
document.write('<option value="fullresult.php?date=20140316&page=01">2014-03-16</option>');
document.write('<option value="fullresult.php?date=20140312&page=01">2014-03-12</option>');
document.write('<option value="fullresult.php?date=20140309&page=01">2014-03-09</option>');
document.write('<option value="fullresult.php?date=20140305&page=01">2014-03-05</option>');
document.write('<option value="fullresult.php?date=20140301&page=01">2014-03-01</option>');
document.write('<option value="fullresult.php?date=20140226&page=01">2014-02-26</option>');
document.write('<option value="fullresult.php?date=20140223&page=01">2014-02-23</option>');
document.write('<option value="fullresult.php?date=20140219&page=01">2014-02-19</option>');
document.write('<option value="fullresult.php?date=20140216&page=01">2014-02-16</option>');
document.write('<option value="fullresult.php?date=20140212&page=01">2014-02-12</option>');
document.write('<option value="fullresult.php?date=20140208&page=01">2014-02-08</option>');
document.write('<option value="fullresult.php?date=20140205&page=01">2014-02-05</option>');
document.write('<option value="fullresult.php?date=20140202&page=01">2014-02-02</option>');
document.write('<option value="fullresult.php?date=20140126&page=01">2014-01-26</option>');
document.write('<option value="fullresult.php?date=20140122&page=01">2014-01-22</option>');
document.write('<option value="fullresult.php?date=20140119&page=01">2014-01-19</option>');
document.write('<option value="fullresult.php?date=20140115&page=01">2014-01-15</option>');
document.write('<option value="fullresult.php?date=20140111&page=01">2014-01-11</option>');
document.write('<option value="fullresult.php?date=20140108&page=01">2014-01-08</option>');
document.write('<option value="fullresult.php?date=20140105&page=01">2014-01-05</option>');
document.write('<option value="fullresult.php?date=20140101&page=01">2014-01-01</option>');
document.write('<option value="fullresult.php?date=20131229&page=01">2013-12-29</option>');
document.write('<option value="fullresult.php?date=20131226&page=01">2013-12-26</option>');
document.write('<option value="fullresult.php?date=20131221&page=01">2013-12-21</option>');
document.write('<option value="fullresult.php?date=20131218&page=01">2013-12-18</option>');
document.write('<option value="fullresult.php?date=20131215&page=01">2013-12-15</option>');
document.write('<option value="fullresult.php?date=20131211&page=01">2013-12-11</option>');
document.write('<option value="fullresult.php?date=20131208&page=01">2013-12-08</option>');
document.write('<option value="fullresult.php?date=20131204&page=01">2013-12-04</option>');
document.write('<option value="fullresult.php?date=20131201&page=01">2013-12-01</option>');
document.write('<option value="fullresult.php?date=20131127&page=01">2013-11-27</option>');
document.write('<option value="fullresult.php?date=20131124&page=01">2013-11-24</option>');
document.write('<option value="fullresult.php?date=20131120&page=01">2013-11-20</option>');
document.write('<option value="fullresult.php?date=20131117&page=01">2013-11-17</option>');
document.write('<option value="fullresult.php?date=20131113&page=01">2013-11-13</option>');
document.write('<option value="fullresult.php?date=20131109&page=01">2013-11-09</option>');
document.write('<option value="fullresult.php?date=20131106&page=01">2013-11-06</option>');
document.write('<option value="fullresult.php?date=20131103&page=01">2013-11-03</option>');
document.write('<option value="fullresult.php?date=20131030&page=01">2013-10-30</option>');
document.write('<option value="fullresult.php?date=20131027&page=01">2013-10-27</option>');
document.write('<option value="fullresult.php?date=20131023&page=01">2013-10-23</option>');
document.write('<option value="fullresult.php?date=20131020&page=01">2013-10-20</option>');
document.write('<option value="fullresult.php?date=20131016&page=01">2013-10-16</option>');
document.write('<option value="fullresult.php?date=20131012&page=01">2013-10-12</option>');
document.write('<option value="fullresult.php?date=20131009&page=01">2013-10-09</option>');
document.write('<option value="fullresult.php?date=20131006&page=01">2013-10-06</option>');
document.write('<option value="fullresult.php?date=20131001&page=01">2013-10-01</option>');
document.write('<option value="fullresult.php?date=20130925&page=01">2013-09-25</option>');
document.write('<option value="fullresult.php?date=20130917&page=01">2013-09-17</option>');
document.write('<option value="fullresult.php?date=20130915&page=01">2013-09-15</option>');
document.write('<option value="fullresult.php?date=20130911&page=01">2013-09-11</option>');
document.write('<option value="fullresult.php?date=20130908&page=01">2013-09-08</option>');
document.write('<option value="fullresult.php?date=20130710&page=01">2013-07-10</option>');
document.write('<option value="fullresult.php?date=20130707&page=01">2013-07-07</option>');
document.write('<option value="fullresult.php?date=20130704&page=01">2013-07-04</option>');
document.write('<option value="fullresult.php?date=20130701&page=01">2013-07-01</option>');
document.write('<option value="fullresult.php?date=20130626&page=01">2013-06-26</option>');
document.write('<option value="fullresult.php?date=20130623&page=01">2013-06-23</option>');
document.write('<option value="fullresult.php?date=20130619&page=01">2013-06-19</option>');
document.write('<option value="fullresult.php?date=20130616&page=01">2013-06-16</option>');
document.write('<option value="fullresult.php?date=20130612&page=01">2013-06-12</option>');
document.write('<option value="fullresult.php?date=20130608&page=01">2013-06-08</option>');
document.write('<option value="fullresult.php?date=20130605&page=01">2013-06-05</option>');
document.write('<option value="fullresult.php?date=20130602&page=01">2013-06-02</option>');
document.write('<option value="fullresult.php?date=20130529&page=01">2013-05-29</option>');
document.write('<option value="fullresult.php?date=20130526&page=01">2013-05-26</option>');
document.write('<option value="fullresult.php?date=20130522&page=01">2013-05-22</option>');
document.write('<option value="fullresult.php?date=20130518&page=01">2013-05-18</option>');
document.write('<option value="fullresult.php?date=20130515&page=01">2013-05-15</option>');
document.write('<option value="fullresult.php?date=20130511&page=01">2013-05-11</option>');
document.write('<option value="fullresult.php?date=20130508&page=01">2013-05-08</option>');
document.write('<option value="fullresult.php?date=20130505&page=01">2013-05-05</option>');
document.write('<option value="fullresult.php?date=20130501&page=01">2013-05-01</option>');
document.write('<option value="fullresult.php?date=20130428&page=01">2013-04-28</option>');
document.write('<option value="fullresult.php?date=20130424&page=01">2013-04-24</option>');
document.write('<option value="fullresult.php?date=20130420&page=01">2013-04-20</option>');
document.write('<option value="fullresult.php?date=20130417&page=01">2013-04-17</option>');
document.write('<option value="fullresult.php?date=20130414&page=01">2013-04-14</option>');
document.write('<option value="fullresult.php?date=20130410&page=01">2013-04-10</option>');
document.write('<option value="fullresult.php?date=20130407&page=01">2013-04-07</option>');
document.write('<option value="fullresult.php?date=20130401&page=01">2013-04-01</option>');
document.write('<option value="fullresult.php?date=20130327&page=01">2013-03-27</option>');
document.write('<option value="fullresult.php?date=20130324&page=01">2013-03-24</option>');
document.write('<option value="fullresult.php?date=20130320&page=01">2013-03-20</option>');
document.write('<option value="fullresult.php?date=20130317&page=01">2013-03-17</option>');
document.write('<option value="fullresult.php?date=20130313&page=01">2013-03-13</option>');
document.write('<option value="fullresult.php?date=20130310&page=01">2013-03-10</option>');
document.write('<option value="fullresult.php?date=20130306&page=01">2013-03-06</option>');
document.write('<option value="fullresult.php?date=20130302&page=01">2013-03-02</option>');
document.write('<option value="fullresult.php?date=20130227&page=01">2013-02-27</option>');
document.write('<option value="fullresult.php?date=20130224&page=01">2013-02-24</option>');
document.write('<option value="fullresult.php?date=20130220&page=01">2013-02-20</option>');
document.write('<option value="fullresult.php?date=20130217&page=01">2013-02-17</option>');
document.write('<option value="fullresult.php?date=20130212&page=01">2013-02-12</option>');
document.write('<option value="fullresult.php?date=20130206&page=01">2013-02-06</option>');
document.write('<option value="fullresult.php?date=20130202&page=01">2013-02-02</option>');
document.write('<option value="fullresult.php?date=20130130&page=01">2013-01-30</option>');
document.write('<option value="fullresult.php?date=20130127&page=01">2013-01-27</option>');
document.write('<option value="fullresult.php?date=20130123&page=01">2013-01-23</option>');
document.write('<option value="fullresult.php?date=20130120&page=01">2013-01-20</option>');
document.write('<option value="fullresult.php?date=20130116&page=01">2013-01-16</option>');
document.write('<option value="fullresult.php?date=20130112&page=01">2013-01-12</option>');
document.write('<option value="fullresult.php?date=20130109&page=01">2013-01-09</option>');
document.write('<option value="fullresult.php?date=20130106&page=01">2013-01-06</option>');
document.write('<option value="fullresult.php?date=20130101&page=01">2013-01-01</option>');
document.write('<option value="fullresult.php?date=20121228&page=01">2012-12-28</option>');
document.write('<option value="fullresult.php?date=20121222&page=01">2012-12-22</option>');
document.write('<option value="fullresult.php?date=20121219&page=01">2012-12-19</option>');
document.write('<option value="fullresult.php?date=20121216&page=01">2012-12-16</option>');
document.write('<option value="fullresult.php?date=20121212&page=01">2012-12-12</option>');
document.write('<option value="fullresult.php?date=20121209&page=01">2012-12-09</option>');
document.write('<option value="fullresult.php?date=20121205&page=01">2012-12-05</option>');
document.write('<option value="fullresult.php?date=20121202&page=01">2012-12-02</option>');
document.write('<option value="fullresult.php?date=20121128&page=01">2012-11-28</option>');
document.write('<option value="fullresult.php?date=20121125&page=01">2012-11-25</option>');
document.write('<option value="fullresult.php?date=20121121&page=01">2012-11-21</option>');
document.write('<option value="fullresult.php?date=20121118&page=01">2012-11-18</option>');
document.write('<option value="fullresult.php?date=20121114&page=01">2012-11-14</option>');
document.write('<option value="fullresult.php?date=20121110&page=01">2012-11-10</option>');
document.write('<option value="fullresult.php?date=20121107&page=01">2012-11-07</option>');
document.write('<option value="fullresult.php?date=20121104&page=01">2012-11-04</option>');
document.write('<option value="fullresult.php?date=20121028&page=01">2012-10-28</option>');
document.write('<option value="fullresult.php?date=20121024&page=01">2012-10-24</option>');
document.write('<option value="fullresult.php?date=20121021&page=01">2012-10-21</option>');
document.write('<option value="fullresult.php?date=20121017&page=01">2012-10-17</option>');
document.write('<option value="fullresult.php?date=20121014&page=01">2012-10-14</option>');
document.write('<option value="fullresult.php?date=20121010&page=01">2012-10-10</option>');
document.write('<option value="fullresult.php?date=20121006&page=01">2012-10-06</option>');
document.write('<option value="fullresult.php?date=20121001&page=01">2012-10-01</option>');
document.write('<option value="fullresult.php?date=20120926&page=01">2012-09-26</option>');
document.write('<option value="fullresult.php?date=20120923&page=01">2012-09-23</option>');
document.write('<option value="fullresult.php?date=20120919&page=01">2012-09-19</option>');
document.write('<option value="fullresult.php?date=20120916&page=01">2012-09-16</option>');
document.write('<option value="fullresult.php?date=20120912&page=01">2012-09-12</option>');
document.write('<option value="fullresult.php?date=20120908&page=01">2012-09-08</option>');
document.write('<option value="oldfullresult.php">1987-2011年度賽後</option>');
</script></select>]

OK! 都在select底下,看起來很亂,可以看到document.write(\'<option value="fullresult.php?date=20140105&page=01">2014-01-05</option>\')。所以它應該是以java script的方式產生option,難怪使用beautifulsoup抓不到option。

看來我們只能去將這一串東西轉成文字,然後使用re(正規表達式)使用re.findall去抓取。re.findall需要兩個參數,第一為pattern,第二為文字內容,只要好好設計pattern,就能從文字中獲取所有符合條件的片段囉~

import re

options_text = str(soup.find_all('select')[0])
re.findall('option value="(.*?)">', options_text)

['fullresult.php?date=20190210&page=01', 'fullresult.php?date=20190207&page=01', 'fullresult.php?date=20190202&page=01',
'fullresult.php?date=20190130&page=01', 'fullresult.php?date=20190127&page=01', 'fullresult.php?date=20190123&page=01',
'fullresult.php?date=20190120&page=01', 'fullresult.php?date=20190116&page=01', 'fullresult.php?date=20190112&page=01',
'fullresult.php?date=20190109&page=01', 'fullresult.php?date=20190106&page=01', 'fullresult.php?date=20190101&page=01',
'fullresult.php?date=20181229&page=01', 'fullresult.php?date=20181226&page=01', 'fullresult.php?date=20181223&page=01',
'fullresult.php?date=20181219&page=01', 'fullresult.php?date=20181216&page=01', 'fullresult.php?date=20181212&page=01',
'fullresult.php?date=20181209&page=01', 'fullresult.php?date=20181205&page=01', 'fullresult.php?date=20181202&page=01',
'fullresult.php?date=20181128&page=01', 'fullresult.php?date=20181125&page=01', 'fullresult.php?date=20181121&page=01',
'fullresult.php?date=20181118&page=01', 'fullresult.php?date=20181114&page=01', 'fullresult.php?date=20181110&page=01',
'fullresult.php?date=20181107&page=01', 'fullresult.php?date=20181104&page=01', 'fullresult.php?date=20181031&page=01',
'fullresult.php?date=20181028&page=01', 'fullresult.php?date=20181024&page=01', 'fullresult.php?date=20181021&page=01',
'fullresult.php?date=20181018&page=01', 'fullresult.php?date=20181013&page=01', 'fullresult.php?date=20181010&page=01',
'fullresult.php?date=20181008&page=01', 'fullresult.php?date=20181007&page=01', 'fullresult.php?date=20181003&page=01',
'fullresult.php?date=20181001&page=01', 'fullresult.php?date=20180926&page=01', 'fullresult.php?date=20180922&page=01',
'fullresult.php?date=20180916&page=01', 'fullresult.php?date=20180912&page=01', 'fullresult.php?date=20180909&page=01',
'fullresult.php?date=20180905&page=01', 'fullresult.php?date=20180902&page=01', 'fullresult.php?date=20180715&page=01',
'fullresult.php?date=20180711&page=01', 'fullresult.php?date=20180708&page=01', 'fullresult.php?date=20180704&page=01',
'fullresult.php?date=20180701&page=01', 'fullresult.php?date=20180627&page=01', 'fullresult.php?date=20180624&page=01',
'fullresult.php?date=20180623&page=01', 'fullresult.php?date=20180616&page=01', 'fullresult.php?date=20180613&page=01',
'fullresult.php?date=20180610&page=01', 'fullresult.php?date=20180606&page=01', 'fullresult.php?date=20180603&page=01',
'fullresult.php?date=20180530&page=01', 'fullresult.php?date=20180523&page=01', 'fullresult.php?date=20180520&page=01',
'fullresult.php?date=20180516&page=01', 'fullresult.php?date=20180512&page=01', 'fullresult.php?date=20180509&page=01',
'fullresult.php?date=20180506&page=01', 'fullresult.php?date=20180502&page=01', 'fullresult.php?date=20180429&page=01',
'fullresult.php?date=20180425&page=01', 'fullresult.php?date=20180421&page=01', 'fullresult.php?date=20180418&page=01',
'fullresult.php?date=20180415&page=01', 'fullresult.php?date=20180411&page=01', 'fullresult.php?date=20180408&page=01',
'fullresult.php?date=20180402&page=01', 'fullresult.php?date=20180328&page=01', 'fullresult.php?date=20180325&page=01',
'fullresult.php?date=20180321&page=01', 'fullresult.php?date=20180318&page=01', 'fullresult.php?date=20180314&page=01',
'fullresult.php?date=20180311&page=01', 'fullresult.php?date=20180307&page=01', 'fullresult.php?date=20180303&page=01',
'fullresult.php?date=20180228&page=01', 'fullresult.php?date=20180225&page=01', 'fullresult.php?date=20180221&page=01',
'fullresult.php?date=20180218&page=01', 'fullresult.php?date=20180214&page=01', 'fullresult.php?date=20180210&page=01',
'fullresult.php?date=20180207&page=01', 'fullresult.php?date=20180204&page=01', 'fullresult.php?date=20180131&page=01',
'fullresult.php?date=20180128&page=01', 'fullresult.php?date=20180124&page=01', 'fullresult.php?date=20180121&page=01',
'fullresult.php?date=20180117&page=01', 'fullresult.php?date=20180113&page=01', 'fullresult.php?date=20180110&page=01',
'fullresult.php?date=20180107&page=01', 'fullresult.php?date=20180101&page=01', 'fullresult.php?date=20171227&page=01',
'fullresult.php?date=20171223&page=01', 'fullresult.php?date=20171220&page=01', 'fullresult.php?date=20171217&page=01',
'fullresult.php?date=20171213&page=01', 'fullresult.php?date=20171210&page=01', 'fullresult.php?date=20171206&page=01',
'fullresult.php?date=20171203&page=01', 'fullresult.php?date=20171129&page=01', 'fullresult.php?date=20171126&page=01',
'fullresult.php?date=20171122&page=01', 'fullresult.php?date=20171119&page=01', 'fullresult.php?date=20171115&page=01',
'fullresult.php?date=20171112&page=01', 'fullresult.php?date=20171111&page=01', 'fullresult.php?date=20171108&page=01',
'fullresult.php?date=20171105&page=01', 'fullresult.php?date=20171101&page=01', 'fullresult.php?date=20171029&page=01',
'fullresult.php?date=20171025&page=01', 'fullresult.php?date=20171022&page=01', 'fullresult.php?date=20171018&page=01',
'fullresult.php?date=20171014&page=01', 'fullresult.php?date=20171011&page=01', 'fullresult.php?date=20171008&page=01',
'fullresult.php?date=20171005&page=01', 'fullresult.php?date=20171001&page=01', 'fullresult.php?date=20170927&page=01',
'fullresult.php?date=20170924&page=01', 'fullresult.php?date=20170920&page=01', 'fullresult.php?date=20170916&page=01',
'fullresult.php?date=20170913&page=01', 'fullresult.php?date=20170910&page=01', 'fullresult.php?date=20170906&page=01',
'fullresult.php?date=20170903&page=01', 'fullresult.php?date=20170716&page=01', 'fullresult.php?date=20170712&page=01',
'fullresult.php?date=20170709&page=01', 'fullresult.php?date=20170701&page=01', 'fullresult.php?date=20170628&page=01',
'fullresult.php?date=20170625&page=01', 'fullresult.php?date=20170621&page=01', 'fullresult.php?date=20170618&page=01',
'fullresult.php?date=20170614&page=01', 'fullresult.php?date=20170611&page=01', 'fullresult.php?date=20170607&page=01',
'fullresult.php?date=20170604&page=01', 'fullresult.php?date=20170531&page=01', 'fullresult.php?date=20170528&page=01',
'fullresult.php?date=20170524&page=01', 'fullresult.php?date=20170521&page=01', 'fullresult.php?date=20170520&page=01',
'fullresult.php?date=20170517&page=01', 'fullresult.php?date=20170513&page=01', 'fullresult.php?date=20170510&page=01',
'fullresult.php?date=20170507&page=01', 'fullresult.php?date=20170503&page=01', 'fullresult.php?date=20170430&page=01',
'fullresult.php?date=20170426&page=01', 'fullresult.php?date=20170423&page=01', 'fullresult.php?date=20170420&page=01',
'fullresult.php?date=20170417&page=01', 'fullresult.php?date=20170412&page=01', 'fullresult.php?date=20170409&page=01',
'fullresult.php?date=20170405&page=01', 'fullresult.php?date=20170402&page=01', 'fullresult.php?date=20170329&page=01',
'fullresult.php?date=20170326&page=01', 'fullresult.php?date=20170322&page=01', 'fullresult.php?date=20170319&page=01',
'fullresult.php?date=20170318&page=01', 'fullresult.php?date=20170315&page=01', 'fullresult.php?date=20170312&page=01',
'fullresult.php?date=20170308&page=01', 'fullresult.php?date=20170305&page=01', 'fullresult.php?date=20170301&page=01',
'fullresult.php?date=20170226&page=01', 'fullresult.php?date=20170222&page=01', 'fullresult.php?date=20170219&page=01',
'fullresult.php?date=20170215&page=01', 'fullresult.php?date=20170211&page=01', 'fullresult.php?date=20170208&page=01',
'fullresult.php?date=20170205&page=01', 'fullresult.php?date=20170202&page=01', 'fullresult.php?date=20170130&page=01',
'fullresult.php?date=20170125&page=01', 'fullresult.php?date=20170122&page=01', 'fullresult.php?date=20170118&page=01',
'fullresult.php?date=20170114&page=01', 'fullresult.php?date=20170111&page=01', 'fullresult.php?date=20170108&page=01',
'fullresult.php?date=20170104&page=01', 'fullresult.php?date=20170101&page=01', 'fullresult.php?date=20161227&page=01',
'fullresult.php?date=20161222&page=01', 'fullresult.php?date=20161217&page=01', 'fullresult.php?date=20161214&page=01',
'fullresult.php?date=20161211&page=01', 'fullresult.php?date=20161207&page=01', 'fullresult.php?date=20161204&page=01',
'fullresult.php?date=20161130&page=01', 'fullresult.php?date=20161127&page=01', 'fullresult.php?date=20161123&page=01',
'fullresult.php?date=20161120&page=01', 'fullresult.php?date=20161118&page=01', 'fullresult.php?date=20161116&page=01',
'fullresult.php?date=20161112&page=01', 'fullresult.php?date=20161109&page=01', 'fullresult.php?date=20161106&page=01',
'fullresult.php?date=20161105&page=01', 'fullresult.php?date=20161102&page=01', 'fullresult.php?date=20161101&page=01',
'fullresult.php?date=20161030&page=01', 'fullresult.php?date=20161029&page=01', 'fullresult.php?date=20161026&page=01',
'fullresult.php?date=20161023&page=01', 'fullresult.php?date=20161022&page=01', 'fullresult.php?date=20161019&page=01',
'fullresult.php?date=20161016&page=01', 'fullresult.php?date=20161015&page=01', 'fullresult.php?date=20161012&page=01',
'fullresult.php?date=20161008&page=01', 'fullresult.php?date=20161005&page=01', 'fullresult.php?date=20161001&page=01',
'fullresult.php?date=20160928&page=01', 'fullresult.php?date=20160925&page=01', 'fullresult.php?date=20160921&page=01',
'fullresult.php?date=20160918&page=01', 'fullresult.php?date=20160911&page=01', 'fullresult.php?date=20160907&page=01',
'fullresult.php?date=20160903&page=01', 'fullresult.php?date=20160710&page=01', 'fullresult.php?date=20160706&page=01',
'fullresult.php?date=20160701&page=01', 'fullresult.php?date=20160626&page=01', 'fullresult.php?date=20160622&page=01',
'fullresult.php?date=20160619&page=01', 'fullresult.php?date=20160616&page=01', 'fullresult.php?date=20160615&page=01',
'fullresult.php?date=20160612&page=01', 'fullresult.php?date=20160609&page=01', 'fullresult.php?date=20160605&page=01',
'fullresult.php?date=20160601&page=01', 'fullresult.php?date=20160529&page=01', 'fullresult.php?date=20160522&page=01',
'fullresult.php?date=20160518&page=01', 'fullresult.php?date=20160514&page=01', 'fullresult.php?date=20160511&page=01',
'fullresult.php?date=20160507&page=01', 'fullresult.php?date=20160504&page=01', 'fullresult.php?date=20160501&page=01',
'fullresult.php?date=20160427&page=01', 'fullresult.php?date=20160424&page=01', 'fullresult.php?date=20160420&page=01',
'fullresult.php?date=20160416&page=01', 'fullresult.php?date=20160413&page=01', 'fullresult.php?date=20160410&page=01',
'fullresult.php?date=20160406&page=01', 'fullresult.php?date=20160403&page=01', 'fullresult.php?date=20160331&page=01',
'fullresult.php?date=20160328&page=01', 'fullresult.php?date=20160323&page=01', 'fullresult.php?date=20160320&page=01',
'fullresult.php?date=20160316&page=01', 'fullresult.php?date=20160313&page=01', 'fullresult.php?date=20160309&page=01',
'fullresult.php?date=20160306&page=01', 'fullresult.php?date=20160302&page=01', 'fullresult.php?date=20160228&page=01',
'fullresult.php?date=20160224&page=01', 'fullresult.php?date=20160221&page=01', 'fullresult.php?date=20160217&page=01',
'fullresult.php?date=20160214&page=01', 'fullresult.php?date=20160210&page=01', 'fullresult.php?date=20160206&page=01',
'fullresult.php?date=20160203&page=01', 'fullresult.php?date=20160131&page=01', 'fullresult.php?date=20160124&page=01',
'fullresult.php?date=20160120&page=01', 'fullresult.php?date=20160117&page=01', 'fullresult.php?date=20160113&page=01',
'fullresult.php?date=20160109&page=01', 'fullresult.php?date=20160106&page=01', 'fullresult.php?date=20160101&page=01',
'fullresult.php?date=20151227&page=01', 'fullresult.php?date=20151223&page=01', 'fullresult.php?date=20151219&page=01',
'fullresult.php?date=20151216&page=01', 'fullresult.php?date=20151213&page=01', 'fullresult.php?date=20151209&page=01',
'fullresult.php?date=20151206&page=01', 'fullresult.php?date=20151202&page=01', 'fullresult.php?date=20151129&page=01',
'fullresult.php?date=20151125&page=01', 'fullresult.php?date=20151121&page=01', 'fullresult.php?date=20151118&page=01',
'fullresult.php?date=20151114&page=01', 'fullresult.php?date=20151111&page=01', 'fullresult.php?date=20151108&page=01',
'fullresult.php?date=20151101&page=01', 'fullresult.php?date=20151025&page=01', 'fullresult.php?date=20151022&page=01',
'fullresult.php?date=20151018&page=01', 'fullresult.php?date=20151014&page=01', 'fullresult.php?date=20151010&page=01',
'fullresult.php?date=20151007&page=01', 'fullresult.php?date=20151004&page=01', 'fullresult.php?date=20151001&page=01',
'fullresult.php?date=20150928&page=01', 'fullresult.php?date=20150923&page=01', 'fullresult.php?date=20150919&page=01',
'fullresult.php?date=20150916&page=01', 'fullresult.php?date=20150913&page=01', 'fullresult.php?date=20150909&page=01',
'fullresult.php?date=20150906&page=01', 'fullresult.php?date=20150712&page=01', 'fullresult.php?date=20150708&page=01',
'fullresult.php?date=20150705&page=01', 'fullresult.php?date=20150701&page=01', 'fullresult.php?date=20150627&page=01',
'fullresult.php?date=20150624&page=01', 'fullresult.php?date=20150621&page=01', 'fullresult.php?date=20150617&page=01',
'fullresult.php?date=20150614&page=01', 'fullresult.php?date=20150610&page=01', 'fullresult.php?date=20150607&page=01',
'fullresult.php?date=20150603&page=01', 'fullresult.php?date=20150531&page=01', 'fullresult.php?date=20150527&page=01',
'fullresult.php?date=20150524&page=01', 'fullresult.php?date=20150520&page=01', 'fullresult.php?date=20150516&page=01',
'fullresult.php?date=20150513&page=01', 'fullresult.php?date=20150509&page=01', 'fullresult.php?date=20150506&page=01',
'fullresult.php?date=20150503&page=01', 'fullresult.php?date=20150429&page=01', 'fullresult.php?date=20150426&page=01',
'fullresult.php?date=20150422&page=01', 'fullresult.php?date=20150419&page=01', 'fullresult.php?date=20150415&page=01',
'fullresult.php?date=20150412&page=01', 'fullresult.php?date=20150407&page=01', 'fullresult.php?date=20150401&page=01',
'fullresult.php?date=20150329&page=01', 'fullresult.php?date=20150325&page=01', 'fullresult.php?date=20150321&page=01',
'fullresult.php?date=20150318&page=01', 'fullresult.php?date=20150315&page=01', 'fullresult.php?date=20150311&page=01',
'fullresult.php?date=20150308&page=01', 'fullresult.php?date=20150304&page=01', 'fullresult.php?date=20150301&page=01',
'fullresult.php?date=20150225&page=01', 'fullresult.php?date=20150221&page=01', 'fullresult.php?date=20150215&page=01',
'fullresult.php?date=20150211&page=01', 'fullresult.php?date=20150207&page=01', 'fullresult.php?date=20150204&page=01',
'fullresult.php?date=20150201&page=01', 'fullresult.php?date=20150128&page=01', 'fullresult.php?date=20150125&page=01',
'fullresult.php?date=20150121&page=01', 'fullresult.php?date=20150118&page=01', 'fullresult.php?date=20150114&page=01',
'fullresult.php?date=20150110&page=01', 'fullresult.php?date=20150107&page=01', 'fullresult.php?date=20150104&page=01',
'fullresult.php?date=20150101&page=01', 'fullresult.php?date=20141228&page=01', 'fullresult.php?date=20141220&page=01',
'fullresult.php?date=20141217&page=01', 'fullresult.php?date=20141214&page=01', 'fullresult.php?date=20141210&page=01',
'fullresult.php?date=20141207&page=01', 'fullresult.php?date=20141203&page=01', 'fullresult.php?date=20141130&page=01',
'fullresult.php?date=20141126&page=01', 'fullresult.php?date=20141123&page=01', 'fullresult.php?date=20141119&page=01',
'fullresult.php?date=20141115&page=01', 'fullresult.php?date=20141112&page=01', 'fullresult.php?date=20141109&page=01',
'fullresult.php?date=20141102&page=01', 'fullresult.php?date=20141029&page=01', 'fullresult.php?date=20141026&page=01',
'fullresult.php?date=20141022&page=01', 'fullresult.php?date=20141019&page=01', 'fullresult.php?date=20141015&page=01',
'fullresult.php?date=20141012&page=01', 'fullresult.php?date=20141008&page=01', 'fullresult.php?date=20141005&page=01',
'fullresult.php?date=20141001&page=01', 'fullresult.php?date=20140927&page=01', 'fullresult.php?date=20140924&page=01',
'fullresult.php?date=20140921&page=01', 'fullresult.php?date=20140917&page=01', 'fullresult.php?date=20140914&page=01',
'fullresult.php?date=20140706&page=01', 'fullresult.php?date=20140701&page=01', 'fullresult.php?date=20140628&page=01',
'fullresult.php?date=20140625&page=01', 'fullresult.php?date=20140622&page=01', 'fullresult.php?date=20140618&page=01',
'fullresult.php?date=20140615&page=01', 'fullresult.php?date=20140611&page=01', 'fullresult.php?date=20140608&page=01',
'fullresult.php?date=20140605&page=01', 'fullresult.php?date=20140601&page=01', 'fullresult.php?date=20140528&page=01',
'fullresult.php?date=20140525&page=01', 'fullresult.php?date=20140521&page=01', 'fullresult.php?date=20140518&page=01',
'fullresult.php?date=20140517&page=01', 'fullresult.php?date=20140514&page=01', 'fullresult.php?date=20140510&page=01',
'fullresult.php?date=20140507&page=01', 'fullresult.php?date=20140504&page=01', 'fullresult.php?date=20140430&page=01',
'fullresult.php?date=20140427&page=01', 'fullresult.php?date=20140421&page=01', 'fullresult.php?date=20140416&page=01',
'fullresult.php?date=20140413&page=01', 'fullresult.php?date=20140409&page=01', 'fullresult.php?date=20140406&page=01',
'fullresult.php?date=20140402&page=01', 'fullresult.php?date=20140330&page=01', 'fullresult.php?date=20140326&page=01',
'fullresult.php?date=20140323&page=01', 'fullresult.php?date=20140319&page=01', 'fullresult.php?date=20140316&page=01',
'fullresult.php?date=20140312&page=01', 'fullresult.php?date=20140309&page=01', 'fullresult.php?date=20140305&page=01',
'fullresult.php?date=20140301&page=01', 'fullresult.php?date=20140226&page=01', 'fullresult.php?date=20140223&page=01',
'fullresult.php?date=20140219&page=01', 'fullresult.php?date=20140216&page=01', 'fullresult.php?date=20140212&page=01',
'fullresult.php?date=20140208&page=01', 'fullresult.php?date=20140205&page=01', 'fullresult.php?date=20140202&page=01',
'fullresult.php?date=20140126&page=01', 'fullresult.php?date=20140122&page=01', 'fullresult.php?date=20140119&page=01',
'fullresult.php?date=20140115&page=01', 'fullresult.php?date=20140111&page=01', 'fullresult.php?date=20140108&page=01',
'fullresult.php?date=20140105&page=01', 'fullresult.php?date=20140101&page=01', 'fullresult.php?date=20131229&page=01',
'fullresult.php?date=20131226&page=01', 'fullresult.php?date=20131221&page=01', 'fullresult.php?date=20131218&page=01',
'fullresult.php?date=20131215&page=01', 'fullresult.php?date=20131211&page=01', 'fullresult.php?date=20131208&page=01',
'fullresult.php?date=20131204&page=01', 'fullresult.php?date=20131201&page=01', 'fullresult.php?date=20131127&page=01',
'fullresult.php?date=20131124&page=01', 'fullresult.php?date=20131120&page=01', 'fullresult.php?date=20131117&page=01',
'fullresult.php?date=20131113&page=01', 'fullresult.php?date=20131109&page=01', 'fullresult.php?date=20131106&page=01',
'fullresult.php?date=20131103&page=01', 'fullresult.php?date=20131030&page=01', 'fullresult.php?date=20131027&page=01',
'fullresult.php?date=20131023&page=01', 'fullresult.php?date=20131020&page=01', 'fullresult.php?date=20131016&page=01',
'fullresult.php?date=20131012&page=01', 'fullresult.php?date=20131009&page=01', 'fullresult.php?date=20131006&page=01',
'fullresult.php?date=20131001&page=01', 'fullresult.php?date=20130925&page=01', 'fullresult.php?date=20130917&page=01',
'fullresult.php?date=20130915&page=01', 'fullresult.php?date=20130911&page=01', 'fullresult.php?date=20130908&page=01',
'fullresult.php?date=20130710&page=01', 'fullresult.php?date=20130707&page=01', 'fullresult.php?date=20130704&page=01',
'fullresult.php?date=20130701&page=01', 'fullresult.php?date=20130626&page=01', 'fullresult.php?date=20130623&page=01',
'fullresult.php?date=20130619&page=01', 'fullresult.php?date=20130616&page=01', 'fullresult.php?date=20130612&page=01',
'fullresult.php?date=20130608&page=01', 'fullresult.php?date=20130605&page=01', 'fullresult.php?date=20130602&page=01',
'fullresult.php?date=20130529&page=01', 'fullresult.php?date=20130526&page=01', 'fullresult.php?date=20130522&page=01',
'fullresult.php?date=20130518&page=01', 'fullresult.php?date=20130515&page=01', 'fullresult.php?date=20130511&page=01',
'fullresult.php?date=20130508&page=01', 'fullresult.php?date=20130505&page=01', 'fullresult.php?date=20130501&page=01',
'fullresult.php?date=20130428&page=01', 'fullresult.php?date=20130424&page=01', 'fullresult.php?date=20130420&page=01',
'fullresult.php?date=20130417&page=01', 'fullresult.php?date=20130414&page=01', 'fullresult.php?date=20130410&page=01',
'fullresult.php?date=20130407&page=01', 'fullresult.php?date=20130401&page=01', 'fullresult.php?date=20130327&page=01',
'fullresult.php?date=20130324&page=01', 'fullresult.php?date=20130320&page=01', 'fullresult.php?date=20130317&page=01',
'fullresult.php?date=20130313&page=01', 'fullresult.php?date=20130310&page=01', 'fullresult.php?date=20130306&page=01',
'fullresult.php?date=20130302&page=01', 'fullresult.php?date=20130227&page=01', 'fullresult.php?date=20130224&page=01',
'fullresult.php?date=20130220&page=01', 'fullresult.php?date=20130217&page=01', 'fullresult.php?date=20130212&page=01',
'fullresult.php?date=20130206&page=01', 'fullresult.php?date=20130202&page=01', 'fullresult.php?date=20130130&page=01',
'fullresult.php?date=20130127&page=01', 'fullresult.php?date=20130123&page=01', 'fullresult.php?date=20130120&page=01',
'fullresult.php?date=20130116&page=01', 'fullresult.php?date=20130112&page=01', 'fullresult.php?date=20130109&page=01',
'fullresult.php?date=20130106&page=01', 'fullresult.php?date=20130101&page=01', 'fullresult.php?date=20121228&page=01',
'fullresult.php?date=20121222&page=01', 'fullresult.php?date=20121219&page=01', 'fullresult.php?date=20121216&page=01',
'fullresult.php?date=20121212&page=01', 'fullresult.php?date=20121209&page=01', 'fullresult.php?date=20121205&page=01',
'fullresult.php?date=20121202&page=01', 'fullresult.php?date=20121128&page=01', 'fullresult.php?date=20121125&page=01',
'fullresult.php?date=20121121&page=01', 'fullresult.php?date=20121118&page=01', 'fullresult.php?date=20121114&page=01',
'fullresult.php?date=20121110&page=01', 'fullresult.php?date=20121107&page=01', 'fullresult.php?date=20121104&page=01',
'fullresult.php?date=20121028&page=01', 'fullresult.php?date=20121024&page=01', 'fullresult.php?date=20121021&page=01',
'fullresult.php?date=20121017&page=01', 'fullresult.php?date=20121014&page=01', 'fullresult.php?date=20121010&page=01',
'fullresult.php?date=20121006&page=01', 'fullresult.php?date=20121001&page=01', 'fullresult.php?date=20120926&page=01',
'fullresult.php?date=20120923&page=01', 'fullresult.php?date=20120919&page=01', 'fullresult.php?date=20120916&page=01',
'fullresult.php?date=20120912&page=01', 'fullresult.php?date=20120908&page=01', 'oldfullresult.php']

Great! 上面這一串只要在前頭加上這個網域(http://hk.racing.nextmedia.com/),就是完整的目標網址,當然其中可能包含其他不是我們要的目標網址,需要寫一些if去移除掉。

接著,把上面的流程全包進一個函式。

我們要做的事情是,隨便進入一個網站獲取所有日期,在以迴圈的方式一一進入各個日期,將每個日期底下所有的分頁抓取出來。

由於這是練習,所以我設了最大抓取上限為至少獲取300個目標網址,你可以不加,另外,以迴圈連續拜訪網站記得加入time.sleep,當個有禮貌的爬蟲,不用sleep太久,我設每拜訪完一個日期休息0.5秒。

import time
import re
import requests
from bs4 import BeautifulSoup


def get_target_urls(max_url = 300):

    url = 'http://hk.racing.nextmedia.com/fullresult.php?date=20190123&page=01'
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    options_text = str(soup.find_all('select')[0])
    # 為了確保沒有其他的網址被抓進來,多做一次確認
    date_urls_temp = re.findall('option value="(.*?)">', options_text)
    date_urls = [url for url in date_urls_temp if 'fullresult.php?date=' in url]

    all_target_urls = []
    for url in date_urls:
        r_date = requests.get('http://hk.racing.nextmedia.com/' + url)
        soup = BeautifulSoup(r_date.text, 'html.parser')
        urls_temp = [link.get('href') for link in soup.find_all('a')]
        all_target_urls += [u for u in urls_temp if 'fullresult.php?date' in u]
        print('已獲得{}個目標網址'.format(len(all_target_urls)))
        time.sleep(0.5)
        if len(all_target_urls) >= max_url:
            print('已獲得至少{}個目標網址'.format(max_url))
            return all_target_urls

all_url = get_target_urls()

已獲得10個目標網址
已獲得21個目標網址
已獲得31個目標網址
已獲得39個目標網址
已獲得49個目標網址
已獲得57個目標網址
已獲得67個目標網址
已獲得75個目標網址
已獲得85個目標網址
已獲得93個目標網址
已獲得103個目標網址
已獲得114個目標網址
已獲得124個目標網址
已獲得132個目標網址
已獲得142個目標網址
已獲得150個目標網址
已獲得161個目標網址
已獲得169個目標網址
已獲得179個目標網址
已獲得188個目標網址
已獲得198個目標網址
已獲得206個目標網址
已獲得216個目標網址
已獲得224個目標網址
已獲得234個目標網址
已獲得242個目標網址
已獲得252個目標網址
已獲得260個目標網址
已獲得270個目標網址
已獲得278個目標網址
已獲得288個目標網址
已獲得296個目標網址
已獲得306個目標網址
已獲得至少300個目標網址

抓取資料與整理

由於資料藏在table底下,因此直接使用pandas的read_html即可抓取。只要放入網址,read_html會將那個網頁底下所有的table轉成pandas的dataframe格式,並打包進一個清單裡,我們可以看看這個清單裡有幾個東西,並檢查看看我們要的資訊在哪裡。

先隨便用一個網頁試試看:

import pandas as pd

data = pd.read_html('http://hk.racing.nextmedia.com/fullresult.php?date=20190123&page=01')
print(len(data))
print('************')

for d in data:
    print(d.head(3))
    print('***********************')

4
************
0 ... 2
0 2019年2月10日 ... document.write('<option value="#" selected>選擇賽...

[1 rows x 3 columns]
***********************
0 ... 19
0 NaN ... NaN
1 第1場 第五班 1200米 (田) 草地 A跑道評分(0-40) 場地:好地 總場次:405... ... NaN
2 馬號 ... NaN

[3 rows x 20 columns]
***********************
0 1 2 3 4 ... 15 16 17 18 19
0 馬號 馬名 歲 騎師 負磅 ... 分段時間 總時間 勝負距離 NaN NaN
1 隔夜 閘前 最後 NaN NaN ... NaN NaN NaN NaN NaN
2 10 銀城福星 6 巫顯東 112 ... 1 1 1 23.87 22.31 24.06 (1.10.24) 頭馬

[3 rows x 20 columns]
***********************
0
0 document.write('「向前看」出閘緩慢。 「非同凡響」及「勁皇子」出閘均僅屬一般...
***********************

OK

有四個table,場地的相關資訊在第二個table,而比賽的資訊在第三個table,接著就要靠著pandas的基本操作,挑選、整理這些table的欄與列,然後變成漂亮完整的資料表,我會產生兩個資料表,一為該場比賽的場地資訊,另一為該場比賽的細部資訊,並且設立一個Game_ID。

這是一個非常瑣碎的過程,場地資訊的內容相當不好處理,尤其是嘗試了幾個不同的網頁後發現並非所有的比賽都含有硬度計指數這個數值,因此我必須多寫個if去處理。另外,你可以看到我使用了許多次re.findall設立各種pattern去抓取,在進行網路爬蟲時,總是要這樣子的,你經常最後只能用最基本的文字搜尋的方法去抓取特定資訊。

import pandas as pd
import re

url = 'http://hk.racing.nextmedia.com/fullresult.php?date=20190123&page=01'
data = pd.read_html(url)

game_id = re.findall('date=(.*?)&', url)[0]+'_'+re.findall('page=(.*?)$', url)[0]
# 場地
temp0 = data[1].iloc[1,0]
env = temp0.split('\xa0')[0:5] + re.findall("([A-Z]\d?)跑道",temp0) + re.findall("跑道評分\((.*?)\)",temp0) + re.findall("場地:(.*?) ",temp0)
env += re.findall("總場次:(\d+)",temp0) + re.findall("度地儀指數:.*?(\d+.\d+)",temp0)
if re.findall("硬度計指數:.*?(\d+.\d+)",temp0):
    env += re.findall("硬度計指數:.*?(\d+.\d+)",temp0)
else:
    env += ['缺值']
env += re.findall("標準時間:.*?(\d+.\d+.\d+)",temp0)
env.insert(0,game_id)

# 比賽細節
columns = ['馬號','馬名','歲','騎師','負磅','檔','評分','廄','馬匹體重',
        '賠率(隔夜)','賠率(隔夜)','賠率(隔夜)','獨贏票(萬)','位置票(萬)','位置賠率',
        '走位','名次','分段時間','總時間','勝負距離']

detail = data[2].iloc[2:-2,:].reset_index(drop=True)
detail.columns = columns
detail.insert(loc=0, column='game_id', value=[game_id]*len(detail))

print(env)
print(detail.head(3))

['20190123_01', '第1場', '第五班', '2200米', '(谷)', '草地', 'C', '0-40', '好地', '358', '2.70']
game_id 馬號 馬名 歲 騎師 負磅 ... 位置賠率 走位 名次 分段
時間 總時間 勝負距離
0 20190123_01 4 為善聚樂 5 莫雷拉 129 ... 1.4 8 9 9 10 7 1 14.75 23.21 24.73 25.63 24.99 24.09 (2.17.4)
頭馬
1 20190123_01 8 糖黐豆 8 蔡明紹 119 ... 3 10 10 10 6 1 2 15.23 23.41 24.41 24.99 24.51 25.01 (2.17.56)
1
2 20190123_01 2 靈鋒 5 梁家俊 132 ... 2.5 9 8 8 7 6 3 14.75 23.17 24.73 25.47 25.15 24.32 (2.17.59)
1-1/4

[3 rows x 21 columns]

Perfect!

最後將再上一步驟中獲得的所有目標網址放入迴圈中,一一去抓取、整理,輸出兩個漂亮的excel檔。

import time
import re
import requests
from bs4 import BeautifulSoup
import pandas as pd


def get_target_urls(max_url = 300):

    url = 'http://hk.racing.nextmedia.com/fullresult.php?date=20190123&page=01'
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    options_text = str(soup.find_all('select')[0])
    # 為了確保沒有其他的網址被抓進來,多做一次確認
    date_urls_temp = re.findall('option value="(.*?)">', options_text)
    date_urls = [url for url in date_urls_temp if 'fullresult.php?date=' in url]

    all_target_urls = []
    for url in date_urls:
        r_date = requests.get('http://hk.racing.nextmedia.com/' + url)
        soup = BeautifulSoup(r_date.text, 'html.parser')
        urls_temp = [link.get('href') for link in soup.find_all('a')]
        all_target_urls += [u for u in urls_temp if 'fullresult.php?date' in u]
        print('已獲得{}個目標網址'.format(len(all_target_urls)))
        time.sleep(0.5)
        if len(all_target_urls) >= max_url:
            print('已獲得至少{}個目標網址'.format(max_url))
            return all_target_urls

def parse_url(url):
    url = 'http://hk.racing.nextmedia.com/' + url
    data = pd.read_html(url)
    
    game_id = re.findall('date=(.*?)&', url)[0]+'_'+re.findall('page=(.*?)$', url)[0]
    # 場地
    temp0 = data[1].iloc[1,0]
    env = temp0.split('\xa0')[0:5] + re.findall("([A-Z]\d?)跑道",temp0) + re.findall("跑道評分\((.*?)\)",temp0) + re.findall("場地:(.*?) ",temp0)
    env += re.findall("總場次:(\d+)",temp0) + re.findall("度地儀指數:.*?(\d+.\d+)",temp0)
    if re.findall("硬度計指數:.*?(\d+.\d+)",temp0):
        env += re.findall("硬度計指數:.*?(\d+.\d+)",temp0)
    else:
        env += ['缺值']
    env += re.findall("標準時間:.*?(\d+.\d+.\d+)",temp0)
    env.insert(0,game_id)
    
    # 比賽細節
    columns = ['馬號','馬名','歲','騎師','負磅','檔','評分','廄','馬匹體重',
            '賠率(隔夜)','賠率(隔夜)','賠率(隔夜)','獨贏票(萬)','位置票(萬)','位置賠率',
            '走位','名次','分段時間','總時間','勝負距離']
    
    detail = data[2].iloc[2:-2,:].reset_index(drop=True)
    detail.columns = columns
    detail.insert(loc=0, column='game_id', value=[game_id]*len(detail))
    
    return env, detail


all_urls = get_target_urls()

envs = []
details = []
env_column = ["game_id","場次", "班次", "跑道長度",
              "地形1","地形2","跑道","跑道評分",
              "場地","總場次","度地儀指數","硬度計指數",
              "標準時間"]

print('Start to parse data from {} urls:'.format(len(all_urls)))
n = 0
for url in all_urls:
    env, detail = parse_url(url)
    envs.append(env)
    details.append(detail)
    n += 1
    print('You have parsed data from {} urls!'.format(n))
    time.sleep(0.5)

env_data = pd.DataFrame(envs)
env_data.columns = env_column
env_data.to_excel('horse_gambling_env.xlsx', index=False)
pd.concat(details,axis=0).to_excel('horse_gambling.xlsx', index=False)

print('Congrates! All is well!!')

恭喜你完成了一次精彩的網路爬蟲!!

2019/07/11 更新範例程式碼

網路爬蟲是個多變的任務,由於是抓取對方網頁的資料,所以只要對方網站稍微修改任何內容,就會很容易導致舊的程式碼執行失敗。尤其是解析原始碼的這個區段。

以這篇教學為例,這篇教學雖然是使用pandas.read_html去獲取資料與解析,但是後來對方網站版型變更,而導致抓取到的資料多了一欄空白的欄位,使得我上面整理資料表的程式碼產生錯誤。


將舊程式碼的54行:

detail = data[2].iloc[2:-2,:].reset_index(drop=True)

改成:

detail = data[2].iloc[2:-2,:-1].reset_index(drop=True)

即可排除其中的一項問題。


如果一般使用beautifulsoup或是正規表達式來直接解析網頁原始碼時,更是容易碰到這樣的問題。

另外,編寫網頁爬蟲程式時,多寫一些檢查的程式碼,if、try等語法。你無法保證在那麼多個分頁中的資料,每個分頁都完整涵蓋你想抓取的資料,一定會碰到例外狀況,而你寫的這些if、try就可以來協助這些例外狀況。

我在更新版的程式碼中的parser_url裡面多設立了幾個檢查用的參數和語法,以確保抓到的資訊會是正確的,另外將抓取失敗的網址額外記錄下來,在額外進行處理。

import time
import re
import requests
from bs4 import BeautifulSoup
import pandas as pd


def get_target_urls(max_url = 300):

    url = 'http://hk.racing.nextmedia.com/fullresult.php?date=20190123&page=01'
    r = requests.get(url)
    soup = BeautifulSoup(r.text, 'html.parser')
    options_text = str(soup.find_all('select')[0])
    # 為了確保沒有其他的網址被抓進來,多做一次確認
    date_urls_temp = re.findall('option value="(.*?)">', options_text)
    date_urls = [url for url in date_urls_temp if 'fullresult.php?date=' in url]

    all_target_urls = []
    for url in date_urls:
        r_date = requests.get('http://hk.racing.nextmedia.com/' + url)
        soup = BeautifulSoup(r_date.text, 'html.parser')
        urls_temp = [link.get('href') for link in soup.find_all('a')]
        all_target_urls += [u for u in urls_temp if 'fullresult.php?date' in u]
        print('已獲得{}個目標網址'.format(len(all_target_urls)))
        time.sleep(0.5)
        if len(all_target_urls) >= max_url:
            print('已獲得至少{}個目標網址'.format(max_url))
            return all_target_urls

def parse_url(url):
    failure_url = ''
    success = True
    url = 'http://hk.racing.nextmedia.com/' + url
    data = pd.read_html(url)
    game_id = re.findall('date=(.*?)&', url)[0]+'_'+re.findall('page=(.*?)$', url)[0]
    # 場地
    temp0 = data[1].iloc[1,0]
    env = temp0.split('\xa0')[0:5] + re.findall("([A-Z]\d?)跑道",temp0) + re.findall("跑道評分\((.*?)\)",temp0) + re.findall("場地:(.*?) ",temp0)
    env += re.findall("總場次:(\d+)",temp0) + re.findall("度地儀指數:.*?(\d+.\d+)",temp0)
    if re.findall("硬度計指數:.*?(\d+.\d+)",temp0):
        env += re.findall("硬度計指數:.*?(\d+.\d+)",temp0)
    else:
        env += ['缺值']
    env += re.findall("標準時間:.*?(\d+.\d+.\d+)",temp0)
    env.insert(0,game_id)
    
    '''
    如果比賽場地資料無法抓到13個,則為例外狀況,設success為False
    '''
    
    if len(env) != 13:
        success = False
        failure_url = url
    
    # 比賽細節
    columns = ['馬號','馬名','歲','騎師','負磅','檔','評分','廄','馬匹體重',
            '賠率(隔夜)','賠率(隔夜)','賠率(隔夜)','獨贏票(萬)','位置票(萬)','位置賠率',
            '走位','名次','分段時間','總時間','勝負距離']
    
    detail = data[2].iloc[2:-2,:-1].reset_index(drop=True)
    
    '''
    # 檢查抓取的欄位數量是否正確,如果不正確
    detail.columns = columns 
    會誘發錯誤,會去執行except的程式碼,success會設為False
    '''
    
    try:
        detail.columns = columns
    except:
        success = False
        failure_url = url
        
        
    detail.insert(loc=0, column='game_id', value=[game_id]*len(detail))
    
    return env, detail, success, failure_url

if __name__ == '__main__':
    
    N = 300 # 目標抓至少300個網頁
    all_urls = get_target_urls(max_url = N)
    
    envs = []
    details = []
    failed_urls = []
    env_column = ["game_id","場次", "班次", "跑道長度",
                  "地形1","地形2","跑道","跑道評分",
                  "場地","總場次","度地儀指數","硬度計指數",
                  "標準時間"]
    
    
    for url in all_urls:
        env, detail, success, failure_url = parse_url(url)
        if success:
            envs.append(env)
            details.append(detail)
        else:
            failed_urls.append(failure_url)
            
        time.sleep(1)
    
    print(f'常識抓取{len(all_urls)}個分頁的資料:')
    print(f'    成功: {len(all_urls) - len(failed_urls)}筆')
    print(f'    失敗: {len(failed_urls)}筆')
    
    env_data = pd.DataFrame(envs)
    env_data.columns = env_column
    env_data.to_excel('horse_gambling_env.xlsx', index=False)
    pd.concat(details,axis=0).to_excel('horse_gambling.xlsx', index=False)
python 網頁爬蟲教學
python網路爬蟲簡介
python網路爬蟲基本工具(1)
python網路爬蟲教學-實戰篇(1) 蘋果日報馬網
使用偽裝user-agent爬取蝦皮購物網
撈取深網中的資料-蝦皮購物API
以POST方式抓取資料-政府電子採購網
python網路爬蟲教學-Selenium基本操作
python網路爬蟲應用-facebook社團成員參與度分析

相關文章:

>