Lookup a Linkedin profile

People Data Enrichment: Find a LinkedIn Profile by Email Address with Airtop

People Data Enrichment: Find a LinkedIn Profile by Email Address with Airtop

People Data Enrichment: Find a LinkedIn Profile by Email Address with Airtop

Nov 21, 2024

Introduction: Why do you need data enrichment?

Data enrichment is both critical and challenging. Having up-to-date and accurate information can make or break your professional networking and business intelligence efforts. Unfortunately, most data enrichment tools rely on stale or incomplete datasets. This problem becomes particularly glaring when trying to find LinkedIn profiles, a vital resource for recruiters, sales professionals, and researchers.

But what if there was a way to avoid these pitfalls? With Airtop, you can harness the power of fresh data and AI-driven automation to build tools that work with precision. In this tutorial, I’ll show you how to create a robust LinkedIn profile scraper using TypeScript, Node.js, and the Airtop SDK. By the end of this guide, you'll have a professional-grade application that can:

  • Read professional profiles from a CSV file.

  • Automatically search Google for LinkedIn profiles using AI to ensure accurate, up-to-date results.

  • Save enriched data to a new CSV file.

Prerequisites

Before we begin, ensure you have the following:

No time to go through the entire setup? I got you! You can download the entire project here and make the necessary changes in the README to have it all up and running quickly.

Step 1: Project Setup

Let's start by creating our project structure. We'll build our entire application incrementally, with each section building upon the last.

Initialize the Project

In your terminal of choice, run the following commands to create a directory, create a new Node.js project, and install TypeScript. 

These commands also install the Airtop SDK for AI scrapping and Dotenv to ensure we keep our credentials safe.

mkdir linkedin-data-enrichment
cd linkedin-data-enrichment
npm init -y
npm install typescript ts-node @types/node @airtop/sdk dotenv
npx tsc --init

Project Structure

Now, create the following directory structure:

linkedin-data-enrichment/
β”‚
β”œβ”€β”€ src/
β”‚   └── index.ts
β”œβ”€β”€ data/
β”‚   └── profiles.csv
β”œβ”€β”€ .env
β”œβ”€β”€ package.json
└── tsconfig.json

Configuration and Type Definitions

We'll start by creating our complete interface and config implementation in src/index.ts. This approach allows us to develop the entire application in a single, cohesive file.

import { AirtopClient } from "@airtop/sdk";
import type { 
  ExternalSessionWithConnectionInfo, 
  SessionResponse, 
  WindowId, 
  WindowIdResponse 
} from "@airtop/sdk/api";
import * as fs from 'fs/promises';
import * as path from 'path';
import dotenv from 'dotenv';
// Configuration Constants
const CONFIG = {
  /**
   * Batch processing configuration
   * Controls the number of profiles processed in parallel
   */
  BATCH_SIZE: 1,
  
  /**
   * Retry mechanism for resilient profile searching
   */
  MAX_RETRIES: 3,
  RETRY_DELAY_MS: 1000,
  
  /**
   * File paths for input and output
   */
  PATHS: {
    INPUT_FILE: 'data/profiles.csv',
    OUTPUT_DIR: 'output',
    OUTPUT_FILE: 'profiles_with_linked_in_profiles.csv'
  }
} as const;
// Type Definitions
interface UserProfile {
  firstName: string;
  lastName: string;
  email: string;
}
interface ProfileWithQuery extends UserProfile {
  query: string;
}
interface ProfileWithLinkedInProfile extends ProfileWithQuery {
  linkedInProfile: string;
}

Utility Functions

We'll add our utility functions to generate search queries and load profiles from our CSV file that will power our Google search:

/**
 * Generates a Google search query to find LinkedIn profile
 * @param userProfile Profile to search for
 * @returns Encoded search URL
 */
const generateGoogleSearchQuery = (userProfile: UserProfile): string => {
  const query = `${userProfile.firstName} ${userProfile.lastName} ${userProfile.email} linkedin`;
  return `https://www.google.com/search?q=${encodeURIComponent(query)}`;
}
/**
 * Reads profiles from CSV file
 * @returns Array of user profiles
 */
  const fetchProfilesFromFile = async (): Promise<UserProfile[]> => {
    const projectRoot = path.resolve(__dirname, '..');
    const filePath = path.join(projectRoot, CONFIG.PATHS.INPUT_FILE);
    const data = await fs.readFile(filePath, 'utf8');
    const lines = data.split('\n');
    
    // Skip header and filter empty lines
    return lines
    .slice(1)
    .filter(line => line.trim())
    .map(line => {
        const [email, firstName, lastName] = line.split(',');
        return {
        email: email.trim(),
        firstName: firstName.trim(),
        lastName: lastName.trim()
        };
    });
  }

LinkedIn Profile Search Function

Now, we'll implement our core search function with retry logic. This function is the heart of our application and is where we spin up multiple Airtop browsers, start a Google search and use AI to filter the results for the type of information we’re looking for:

/**
 * Searches for a LinkedIn profile with intelligent retry mechanism
 */
const searchForLinkedInProfile = async (
  session: ExternalSessionWithConnectionInfo, 
  window: WindowId, 
  client: AirtopClient, 
  profile: ProfileWithQuery
): Promise<string | null> => {
  const delay = (ms: number) => new Promise(resolve => setTimeout(resolve, ms));
  
  for (let attempt = 1; attempt <= CONFIG.MAX_RETRIES; attempt++) {
    try {
      await client.windows.loadUrl(session.id, window.windowId, {
        url: profile.query,
      });
      
      console.log(`Searching for ${profile.firstName} ${profile.lastName} on LinkedIn`);
      
      const result = await client.windows.pageQuery(session.id, window.windowId, {
        prompt: `You are tasked with retrieving a person's LinkedIn profile URL. Please locate the LinkedIn profile for the specified individual and return only the URL. 
        LinkedIn profile URLs begin with https://www.linkedin.com/in/ so use that to identify the profile. There may be profiles with country based subdomains like https://nl.linkedin.com/in/ that you should also use.
        If there are multiple links, return the one that most closely matches the profile based on the email domain and the name. 
        Do not return any other text than the URL.
        Do not return any urls corresponding to posts that may begin with https://www.linkedin.com/posts/
        If you are unable to find the profile, return 'Error'`
      });
      
      return result.data.modelResponse;
    } catch (error) {
      if (attempt === CONFIG.MAX_RETRIES) {
        console.error(`Failed to find profile after ${CONFIG.MAX_RETRIES} attempts:`, profile.email, error);
        return null;
      }
      
      console.warn(`Attempt ${attempt} failed, retrying...`);
      await delay(CONFIG.RETRY_DELAY_MS * attempt);
    }
  }
  return null;
}

Batch Processing Functions

We'll add functions to handle batch processing. If there are many contacts to try and get information from, there will be an option to batch requests into multiple groups.

/**
 * Processes a single batch of profiles
 */
const runSequentialBatch = async (client: AirtopClient, profiles: ProfileWithQuery[], batchIndex: number) => {
    console.log(`Running batch ${batchIndex}`);
    let session: SessionResponse;
    let window: WindowIdResponse;
    try {
      session = await client.sessions.create();
    } catch (error) {
      console.error("Error creating session", error);
      return [];
    }
  
    try {
      window = await client.windows.create(session.data.id);
    } catch (error) {
      console.error("Error creating window", error);
      return [];
    }
  
    console.log("Created session and window for batch", batchIndex);
  
    const profilesWithLinkedInProfiles: ProfileWithLinkedInProfile[] = [];
    for (const profile of profiles) {
      const linkedInProfile = await searchForLinkedInProfile(session.data, window.data, client, profile);
      if (linkedInProfile) {
        const result = {
          ...profile,
          linkedInProfile
        }
        profilesWithLinkedInProfiles.push(result);
      }
    }
  
    await client.sessions.terminate(session.data.id);
  
    return profilesWithLinkedInProfiles;
  }
/**
 * Runs batches in parallel
 */
const runBatchesInParallel = async (
  client: AirtopClient, 
  batches: ProfileWithQuery[][]
): Promise<ProfileWithLinkedInProfile[]> => {
  const promises = batches.map((batch, index) => 
    runSequentialBatch(client, batch, index)
  );
  
  const results = await Promise.all(promises);
  return results.flat();
}

Results Saving Function

Now, we create logic in our code to create another CSV file if one doesn’t exist to save the results we get from each browser instance from the above code.

/**
 * Saves enriched profiles to CSV
 */
const saveProfilesToFile = async (
  profiles: ProfileWithLinkedInProfile[]
): Promise<void> => {
  const projectRoot = path.resolve(__dirname, '..');
  const outputDir = path.join(projectRoot, CONFIG.PATHS.OUTPUT_DIR);
  const filePath = path.join(outputDir, CONFIG.PATHS.OUTPUT_FILE);
  
  await fs.mkdir(outputDir, { recursive: true });
  
  const csvHeaders = ['email', 'firstName', 'lastName', 'linkedInProfile'];
  const csvRows = profiles.map(profile => [
    profile.email,
    profile.firstName,
    profile.lastName,
    profile.linkedInProfile,
  ]);
  
  const csvContent = [
    csvHeaders.join(','),
    ...csvRows.map(row => row.join(','))
  ].join('\n');
  
  await fs.writeFile(filePath, csvContent);
  
  console.log(`Saved ${profiles.length} profiles to ${filePath}`);
}

Main Execution Function

Finally, we'll create our main execution function to orchestrate the above code and call the functions based on our config.

/**
 * Main application entry point
 */
const main = async () => {
  console.time('Total Execution Time');
  
  try {
    // Load API key from environment
    const apiKey = process.env.AIRTOP_API_KEY;
    if (!apiKey) {
      throw new Error("AIRTOP_API_KEY is not set");
    }
    // Initialize Airtop Client
    const client = new AirtopClient({ apiKey });
    
    // Load and prepare profiles
    const profiles = await fetchProfilesFromFile();
    console.log(`Loaded ${profiles.length} profiles`);
    
    const profilesWithQueries = generateProfilesWithSearchQueries(profiles);
    
    // Create batches
    const batches: ProfileWithQuery[][] = [];
    for (let i = 0; i < profilesWithQueries.length; i += CONFIG.BATCH_SIZE) {
      batches.push(profilesWithQueries.slice(i, i + CONFIG.BATCH_SIZE));
    }
    
    // Process batches
    const profilesWithLinkedInProfiles = await runBatchesInParallel(client, batches);
    
    // Save results
    await saveProfilesToFile(profilesWithLinkedInProfiles);
    
    // Print summary
    console.log("\n=== Execution Summary ===");
    console.log(`Total profiles processed: ${profiles.length}`);
    console.log(`Successful matches: ${profilesWithLinkedInProfiles.length}`);
    console.log(`Failed matches: ${profiles.length - profilesWithLinkedInProfiles.length}`);
  } catch (error) {
    console.error("Application failed:", error);
    process.exit(1);
  } finally {
    console.timeEnd('Total Execution Time');
  }
}
// Kick off the application
dotenv.config();
main().catch(console.error);

Environment Setup

Update the .env file in the project root with the following content:

AIRTOP_API_KEY=your_airtop_api_key_here

You can go here to get your API key for free.

Preparing Input Data

Create data/profiles.csv with the information from the people you want to get LinkedIn profiles from. Check out our example here if you need a little help :

email,firstName,lastName
john.doe@example.com,John,Doe
jane.smith@company.com,Jane,Smith

Running the Application

Back in your terminal, it’s time to get running.

# Install dependencies
npm install
# Build and run the application
node ./src/index.js

A word on Ethical Technology and Responsible Innovation

The true power of this tool lies not just in its technical capabilities but in its responsible application. Technologists and professionals have a critical responsibility to use automation technologies with integrity. This means respecting platform terms of service, protecting individual privacy, and ensuring that our innovations serve human needs without compromising ethical standards.

Conclusion

Building a LinkedIn data enrichment tool is more than just writing codeβ€”it's about solving real-world professional networking challenges with intelligent, ethical technology. By leveraging TypeScript, Node.js, and the Airtop SDK, we've created a sophisticated solution that transforms manual profile searching into an automated, efficient process. Let us know how you use it and which improvements you make.

Happy coding! πŸš€πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»

β“’2024


β“’2024

β“’2024