Scraping all jobs you have applied for in LinkedIn using BeautifulSoup & Python

Even though LinkedIn tries to protect itself from web scrapers there are ways to extract information using Python. In this example we will gather info about all the positions we have applied to for as long as LinkedIn allows us to see (for me it is 2 years). 

We will create an excel file with the following columns:

  • Job Title
  • URL of the job offer page
  • Company Name
  • URL of the Company page
  • Job Location
  • Time you applied for the job (... ago)

Getting HTML

First of all we need to open the page https://www.linkedin.com/jobs/tracker/applied/ and scroll down as far as LinkedIn allows. Now if we try to "Save page as HTML"  or CTRL+U we will get some mess without info. For this reason we do right click -> display element code -> copy outer HTML -> paste it in a new text file.

HTML to XLSX

All we have to do now is to run the code:

from bs4 import BeautifulSoup
import pandas as pd
import numpy as np

# Path to input file
in_file = "C:/input.txt"

f = open(in_file, "r", encoding="utf8")
soup = BeautifulSoup(f.read())

JobPostings = soup.find_all('div',class_="display-flex p0 flex-wrap ember-view")
print("Postings "+str(len(JobPostings)))

job_titles, company_names, loc,app_times = [],[],[],[]
jobs, jobURLs = [],[]
companyNames, companyURLs, locs,app_time = [],[],[],[]
df = pd.DataFrame()

for JobPosting in JobPostings:
   
    job_titles = JobPosting.find_all("a", class_="jobs-job-card-content__title ember-view")
    jobs.append(job_titles[0].string)
    jobURLs.append('https://linkedin.com'+job_titles[0]['href'])
    
    company_names= JobPosting.find("a", class_="t-black jobs-job-card-content__company-name t-14 t-normal ember-view")
    if company_names!=None:
        companyNames.append(company_names.string)
        companyURLs.append('https://linkedin.com'+company_names['href'])
    else:
        companyNames.append("NA")
        companyURLs.append("NA")
        
        
    loc=JobPosting.find('span', class_="t-12 t-black--light")
    if loc!=None:
        locs.append(loc.string)
    else:
        locs.append("NA")
    
    app_time=JobPosting.find('span', class_=["t-bold t-black t-black--light","jobs-job-card-content__bullet"])
    text=""
    for i in app_time:
            text = text +" "+ i.string
    app_times.append(text)
    
print(len(app_times))    
df['Job Title'] = np.array(jobs)
df['Job URL'] = np.array(jobURLs)
df['Company Name'] = np.array(companyNames)
df['Company URL'] = np.array(companyURLs)
df['Locations'] = np.array(locs)
df['AppTime'] = np.array(app_times)

print(df)

# Where to put excel file
df.to_excel("C:/output.xlsx")