(PowerBuilder) Fetch robots.txt for a Site
The Chilkat Spider library is robots.txt compliant. It automatically fetches a site's robots.txt file and adheres to it. It will not download pages denied by robots.txt. Pages excluded by robots.txt will not appear in the Spider's "unspidered" list. This example shows how to explicitly download and review the robots.txt for a given site.
integer li_rc
oleobject loo_Spider
string ls_RobotsText
loo_Spider = create oleobject
// Use "Chilkat_9_5_0.Spider" for versions of Chilkat < 10.0.0
li_rc = loo_Spider.ConnectToNewObject("Chilkat.Spider")
if li_rc < 0 then
destroy loo_Spider
MessageBox("Error","Connecting to COM object failed")
return
end if
loo_Spider.Initialize("www.chilkatsoft.com")
ls_RobotsText = loo_Spider.FetchRobotsText()
Write-Debug ls_RobotsText
destroy loo_Spider
|