INTERNAL/REVIEW: Example of Reading Web Pages using IDL's SOCKET Routine
Anonym
Needs to be reviewed for Compliance and IP issues (i.e. .pro file included)]
IDL's SOCKET routine can be used to establish a client-side TCP/IP connection. Using this functionality, IDL can be used to provide a simple HTTP client application that is capable of downloading information from a Web server.
Discussion
The following example code allows IDL to retrieve data from a Web page. Note that this example will only work for Web servers that use the standard port 80. The HTTP/1.0 protocol is used to transmit the request. If the server expects another protocol such as for example secure http (HTTPS), then the program will fail.
Click here to download read_http.pro.
;+
; READ_HTTP
;
; Purpose: Simple way to read data from a website.
; Syntax: READ_HTTP, URL [,DATA ] [,FILENAME=string ]
; [,HEADER=variable ] [,/POST ] [,/GET ]
;
; Arguments
; URL - String that contains the complete URL. Can
; optionally include the http:// prefix.
;
; DATA - This argument should be a named variable which will
; contain the content of the web page.
;
; Keywords:
; FILENAME - Specify a local filename where the content of the web
; page should be stored. Do not use the DATA argument
; when this keyword is used. Using this keyword will in
; most cases require much less memory than using the DATA
; argument.
;
; HEADER - A named variable that will contain the header information
; for the specified web page.
;
; POST - Set this keyword to use the POST method instead of the GET
; method. Some websites requires using the POST method when
; submitting information. If this keyword is not present then
; the GET method is used.
;
;
;
; Examples of use.
; 1) Display the image from RSI's homepage:
; IDL> read_http, 'www.rsinc.com/img/home/RSI_HomepageSept20.jpg', $
; filename='homepage.jpg'
; IDL> read_jpeg, 'homepage.jpg', image
; IDL> device, decomposed=1 & tv, image, true=1
;
; 2) Download and display recent minimum temperatures recorded in the US:
; IDL> read_http,'http://www.ems.psu.edu/wx/usstats/mint.lis', data
; IDL> print, string(data)
; 07/30/2003-15Z 53.1 F at Arcata CA (ACV)
; 07/30/2003-14Z 48.0 F at STANLEY ID (SNT)
; 07/30/2003-13Z 39.2 F at Leadville CO (LXV)
; 07/30/2003-12Z 39.2 F at Leadville CO (LXV)
; 07/30/2003-11Z 37.4 F at Leadville CO (LXV)
; 07/30/2003-10Z 42.8 F at Leadville CO (LXV)
; 07/30/2003-09Z 42.1 F at Leadville CO (LXV)
; 07/30/2003-08Z 44.6 F at Leadville CO (LXV)
; 07/30/2003-07Z 44.6 F at Leadville CO (LXV)
; ...
;
;
; 3) Download and display a plot of the exchange rate for NOK/USD in
; the last 91 days:
;
; pro test_read_http
; url='http://fx.sauder.ubc.ca/cgi/fxplot' + $
; '?rd=91&f=png&q=volume&y=daily&b=USD&c=NOK&mavg=0'
; read_http, url, data
; str=string(data)
; p1=strpos(str,'SRC="',strpos(str,'![]()
; p2=strpos(str,'"',p1)
; url=strmid(str,p1,p2-p1)
; read_http, url, filename='currency.png'
; data=read_png('currency.png', r,g,b)
; tvlct, r,g,b
; device, decomposed=0
; dim=size(data,/dim)
; window, xs=dim[0], ys=dim[1] & tv, data
; end
;-
; basic readf implementation for rawio
pro http_readf, u, line, timeout=timeout
compile_opt idl2
if size(timeout,/type) eq 0 then timeout=1
buffer=bytarr(512)
pos=0
char=0b
t0=systime(1)
tc=1
while (tc or (systime(1)-t0 lt timeout)) do begin
readu, u, char, transfer_count=tc
if (char eq 10b) then break
if tc and (char ne 13b) then buffer[pos++]=char
if pos eq n_elements(buffer) then buffer=[buffer,buffer]
endwhile
if (pos gt 0) then line=string(buffer[0:pos-1]) else line=''
end
;
; request page using GET method
pro http_get, u, addr, res, head=head, file=file
msg=['GET '+addr+' HTTP/1.0','User-Agent: IDL6.0','']
printf, u, msg, format='(a)'
http_read,u,res, head=head, file=file
end
;
; request page using POST method
pro http_post, u, addr, res, head=head, file=file
pos=strpos(addr,'?')
if (pos eq -1) then pos=strlen(addr)
ad=strmid(addr,0,pos)
ques=strmid(addr,pos)
msg=['POST '+ad+' HTTP/1.0','User-Agent: IDL6.0',$
'Content-type: application/x-www-form-urlencoded',$
'Content-Length:'+strtrim(strlen(ques),2),$
'',$
ques,$
'']
printf, u, msg, format='(a)'
http_read, u, res, head=head, file=file
end
;
pro http_read, u, data, head=head, file=file
compile_opt idl2
line='...'
sz=-1
count=0
while (line ne '') do begin
http_readf, u, line
head=(count eq 0)?line:[head,line]
count++
if stregex(line,'Content-Length',/fold_case,/boolean) then begin
sz=long(strmid(line,strpos(line,':')+1))
endif
endwhile
; known size
if (sz ne -1) then begin
bufsize=512
buffer=bytarr(bufsize)
pos=0
if keyword_set(file) then begin
openw,v,file,/get_lun
endif else data=bytarr(sz)
while (pos lt sz) do begin
readu, u, buffer, transfer_count=tc
if (tc eq 0) then continue
if keyword_set(file) then $
writeu,v,(tc eq bufsize)?buffer:buffer[0:tc-1] $
else data[pos]=(tc eq bufsize)?buffer:buffer[0:tc-1]
pos+=tc
endwhile
if keyword_set(file) then free_lun,v
endif else begin
; unknown size
bufsize=512
buffer=bytarr(bufsize)
pos=0
if keyword_set(file) then begin
openw,v,file,/get_lun
endif else data=bytarr(4*bufsize)
while 1 do begin
readu, u, buffer, transfer_count=tc
if (tc eq 0) then break
if keyword_set(file) then $
writeu,v,(tc eq bufsize)?buffer:buffer[0:tc-1] $
else begin
if pos+tc gt n_elements(data) then data=[data,data]
data[pos]=(tc eq bufsize)?buffer:buffer[0:tc-1]
endelse
pos+=tc
endwhile
if keyword_set(file) then free_lun,v $
else data=data[0:pos-1]
endelse
end
;
pro read_http, url, res, filename=file, post=post, header=head
compile_opt idl2
; parse url
if strmatch(url,'http://*') eq 0 then url='http://'+url
pos=strpos(url,'/',7)
if (pos eq -1) then pos=strlen((url+='/'))-1
host=strmid(url,7,pos-7)
; retrieve data
socket, u, host, 80, connect_timeout=5, read_timeout=5, $
/get_lun, /rawio
case keyword_set(post) of
0: http_get, u, url, res, file=file, head=head
1: http_post, u, url, res, file=file, head=head
endcase
free_lun, u
end