(SQL Server) Fetch robots.txt for a Site
The Chilkat Spider library is robots.txt compliant. It automatically fetches a site's robots.txt file and adheres to it. It will not download pages denied by robots.txt. Pages excluded by robots.txt will not appear in the Spider's "unspidered" list. This example shows how to explicitly download and review the robots.txt for a given site.
-- Important: See this note about string length limitations for strings returned by sp_OAMethod calls.
--
CREATE PROCEDURE ChilkatSample
AS
BEGIN
DECLARE @hr int
DECLARE @spider int
-- Use "Chilkat_9_5_0.Spider" for versions of Chilkat < 10.0.0
EXEC @hr = sp_OACreate 'Chilkat.Spider', @spider OUT
IF @hr <> 0
BEGIN
PRINT 'Failed to create ActiveX component'
RETURN
END
EXEC sp_OAMethod @spider, 'Initialize', NULL, 'www.chilkatsoft.com'
DECLARE @robotsText nvarchar(4000)
EXEC sp_OAMethod @spider, 'FetchRobotsText', @robotsText OUT
PRINT @robotsText
EXEC @hr = sp_OADestroy @spider
END
GO
|