bphlpt, on 21 February 2012 - 04:18 AM, said:
That's a good point. Here's the VBScript again for those it might help at some point:
Option Explicit
Dim oXmlHttp, oRegExp, oMatch, adoStr, sChildPages(), i, url
Set oXmlHttp = createobject ("Msxml2.ServerXMLHTTP.6.0")
oXmlHttp.Open "GET", "http://www.slv.dk/Dokumenter/dsweb/View/Collection-357", False
oXmlHttp.Send
Set oRegExp = New RegExp
oRegExp.IgnoreCase = True
oRegExp.Global = True
oRegExp.Pattern = "<a\shref=""(/Dokumenter/dsweb/View/Collection-\d*)"">"
Set oMatch = oRegExp.Execute(oXmlHttp.ResponseText)
If oMatch.Count = 0 Then WScript.Quit
'really ugly hack where we skip the first child page found (itself)
ReDim sChildPages(oMatch.Count-2)
For i = 1 to oMatch.Count-1
sChildPages(i-1) = "http://www.slv.dk" & oMatch.Item(i).Submatches(0)
Next
oRegExp.Pattern = "<a\shref=""(/Dokumenter/dsweb/Get/Document.*pdf)""\sclass=""uline""><b>(.*?)</b>"
For Each url in sChildPages
oXmlHttp.Open "GET", url, False
oXmlHttp.Send
Set oMatch = oRegExp.Execute(oXmlHttp.ResponseText)
For i = 0 to oMatch.Count-1
DownloadBinaryFile "http://www.slv.dk" & oMatch.Item(i).Submatches(0), oMatch.Item(i).Submatches(1) & ".pdf"
Next
Next
Function DownloadBinaryFile(sUrl, sFileName)
oXmlHttp.Open "GET", sUrl, False
oXmlHttp.Send
Set adoStr = CreateObject("ADODB.Stream")
adoStr.Type = 1 'adTypeBinary
adoStr.Open
adoStr.Write oXmlHttp.ResponseBody
adoStr.SaveToFile sFileName, 2 'adSaveCreateOverWrite
adoStr.Close
End Function
It's pretty ugly, there's no error handling of any kind and all that but it gets the job done. Writing essentially the same thing in other languages should be pretty straightforward too (most of the work here is getting the regular expressions right). And in most cases it would be nicer/better/simpler too (VBScript data structures suck hard, downloading binary files here is a bit of a hack, error handling is beyond awful, etc).



Help

Back to top










