Chilkat2-Python

Extract all HTML Objects from a Web Page

Demonstrates how to download a Web page (at a URL) and extract all HTML objects. Eg. images, links, CSS files, JavaScript files, etc.

Chilkat Chilkat2-Python Downloads

Download Chilkat for Chilkat2-Python

Chilkat2-Python

import sys
import chilkat2

success = False

# This example assumes the Chilkat API to have been previously unlocked.
# See Global Unlock Sample for sample code.

mht = chilkat2.Mht()

# Download a URL into an in-memory MHT web archive contained
# in a string variable.
# The following URL is randomly picked and was valid at the time of writing this example:
mhtDoc = mht.GetMHT("https://www.tetonlodge.com/")
if (mht.LastMethodSuccess != True):
    print(mht.LastErrorText)
    sys.exit()

# Extract the HTML and embedded objects:
unpackDir = "C:/AAWorkarea/mhtTesting/"
htmlFilename = "lodge.html"
partsSubdir = "objects"

# Extract to C:/AAWorkarea/mhtTesting/lodge.html.
# images and other embedded objects are placed in
# C:/AAWorkarea/mhtTesting/objects.  Directories are automatically
# created if they don't already exist.
success = mht.UnpackMHTString(mhtDoc,unpackDir,htmlFilename,partsSubdir)
if (success != True):
    print(mht.LastErrorText)
else:
    print("Unpacked!")