If you are one of the many people who, like me, are coming to Python for data analysis after having spent a lot of time working with Microsoft Excel, you will at some point find yourself saying, “How do I do a vLookup in Python?” (Or, if you’re really like me, you’ll throw in a few expletives.)

Despite many search results and stackoverflow questions to the same effect, I have not found a resource that gives a straightforward, practical explanation of how to do vLookup (or it’s sexier alternative, index/match). This post is my attempt to do that.

What is vLookup? For those of you who may be unfamiliar, vLookup is an Excel formula that allows you to populate the cells in a given column based on a column-wise (vertical) match in another spreadsheet or table. For example, let’s say I had a table with Employee ID and Employee Name, and I also had a table with a log of keycard usage over time that included the Employee ID for that keycard, but not the name. If I wanted to update the keycard log with the employee name, I would use vLookup to match the Employee ID in the keycard table with the Employee ID in the employees table, and populate a new column in the keycard table with the Employee name matched in the Employee table. It’s hugely helpful and very commonly used.

I will describe two methods to achieve the same result in Python, using the pandas library. (I am sure there are other wasy to do it, but I find that these are good starting points.)

Method 1: .map() with a Dictionary

In this method, you can use the .map() method in pandas to fill a dataframe column based on matched values in a Python dictionary.

Sticking with the example above:

# First, create a dictionary with employee IDs and names
# where the key is the ID and the value is the name.
employees = {
12345: "Jean-Luc",
98766: "Deanna",
29384: "Geordi"
}

Then, let’s say I have a dataframe of keycard logs called logs_df that has a timestamp and EmployeeID, but no employee name. It might look something like this:

Time EmployeeID
07:03:52 98766
09:23:02 29384
09:52:23 98766
10:01:33 12345
13:43:43 29384

Notice that I have more rows than employee names. This is unsurprising for a keycard log, but it also means I am comparing different-shaped data.

To accomplish the mapping with the dictionary, .map() is very useful:

# Match the EmployeeID field to employee name and put
# the employee name in a new column called EmployeeName
logs_df['EmployeeName'] = logs_df.EmployeeID.map(employees)

The output will look like this:

Time EmployeeID EmployeeName
07:03:52 98766 Deanna
09:23:02 29384 Geordi
09:52:23 98766 Deanna
10:01:33 12345 Jean-Luc
13:43:43 29384 Geordi

Method 2: .merge() with a Left Join

What if your lookup values are in another dataframe, rather than a dictinoary? You could certainly create a dictionary form that dataframe, but it creates unnecessary extra work. Instead, you can use the pandas .merge() method with a left join to accomplish your match.

Let’s say my employee IDs were in a dataframe, called employees_df that looked like this:

EmployeeID EmployeeName
98766 Deanna
29384 Geordi
12345 Jean-Luc

In this case, I can populate logs_df with the employee names like this:

# Using the pandas .merge() method with the how='left'
# argument:
logs_df = pd.merge(logs_df, employees_df, how='left',
        left_on='EmployeeID', right_on='EmployeeID')

And voila, we have our desired result:

Time EmployeeID EmployeeName
07:03:52 98766 Deanna
09:23:02 29384 Geordi
09:52:23 98766 Deanna
10:01:33 12345 Jean-Luc
13:43:43 29384 Geordi