Scrape all URLs From a Website and Write Into CSV

In this article I will show how to crawl any website and collect all URLs of all links that it has. The registered users are also able to copy the whole script with additional function - writing to CSV and TXT files.

We will use requests and BeautifulSoup libraries:

import requests
from bs4 import BeautifulSoup

Than we specify the website we would like to walk through:

SITE = "https://www.excelfiles.space"
URLS = []

Define the recursive function to get all of the href tags from all pages of the website.

def scrape(url):
        
    r = requests.get(url)
    s = BeautifulSoup(r.text,"lxml")
      
    for i in s.find_all("a"):
        
        href = i.attrs['href']
        if href.startswith("/"):
            url = SITE + href
            if url not in URLS:
                URLS.append(url)
                scrape(url)

All is left is just to run the script:

scrape(SITE)
for URL in URLS:
    print(URL)

And get the result:

Additional functionality for the registered users:

The full script script, which does this:

And can write the result to TXT and CSV.