Tips & Insights

Your Email, Your Story: Exploring Data Magic with AI and Python

Written by

Gino Silva-Payne

July 5, 2023

AI has been dominating headlines for the last year, and it seems like its growth is getting faster and faster. Chat-GPT launched late November 2022, and became the fastest growing consumer internet app EVER. It hit 200 million users in just 2 months, cementing 2023 as a big year for AI.

Whether you’re an early adopter of AI, a skeptic or somewhere in between, there are some undoubtedly amazing things anyone can achieve with it. Which leads me to today’s post:

You’re sitting on a ton of data right now, from multiple platforms. Use a fitbit? You can download that data. Use social media? You can download your raw data for nearly every interaction you’ve had there. Use Email Meter? Raw data exports of all your email interactions are ready to be downloaded.

Despite having access to all this raw data (and the things we can discover from it), it was previously incredibly time consuming, and basically impossible for anyone who wasn’t an engineer to learn anything about themselves from it. Gigantic spreadsheets or JSON files aren’t something most of us can tackle, unless we’re data scientists. That’s where AI comes in.

By asking Chat-GPT (or any other platform) to write you some basic Python scripts, you can now find out all kinds of interesting things about yourself from your own data! I was able to generate a ton of interesting scripts to dive deeper into our email data, with zero coding experience. I'll be sharing some today, which I hope will inspire you to create your own!

But wait, what the hell is Python?

Python is a very popular programming language, widely used for a range of apps, such as Netflix, Reddit, Spotify and…Email Meter 🙂 It’s powerful but simple enough for an AI to write competently. To run the scripts in this post, we’ll be setting up a basic Python ‘environment.

Setting up your Python Environment

I’ve opted for Jupyter as my Python Environment, as it’s simple to set up. There’s many other ways to run these scripts, but we don’t need to worry about that right now. Follow these steps to get your environment up and running:

Install Python: Visit the official Python website (https://www.python.org/) and download the latest version of Python for your operating system. Run the installer and follow the on-screen instructions to complete the installation.

Install Jupyter Notebook: Open a command prompt (Windows) or terminal (macOS/Linux) and type the following command to install Jupyter Notebook using the Python package manager, pip:

pip install jupyter

Launch Jupyter Notebook: After the installation is complete, you can launch Jupyter Notebook by typing the following command in the command prompt or terminal:

jupyter notebook

This will open Jupyter Notebook in your default web browser.

Create a new notebook: In the Jupyter Notebook interface, click on the "New" button and select "Python 3" (or any other available Python kernel) to create a new notebook.

You can now start writing and executing Python code in individual cells within the notebook. To run a cell, press Shift + Enter or click the "Run" button in the toolbar.

Now that you’ve got your environment set up, you’re ready to run some scripts!

Pulling your email data

To access your data, follow these steps:

Open Email Meter: If you haven't signed up yet, you can do so here, and you’ll get a free trial of Pro which will give you access to Raw Data Exports.

Apply the filters you want: I’d suggest using a Custom Report to pull a relatively long period, at least 1 month. You can remove automated emails in the Settings page, so you can start with the most accurate data possible.

Export your data: Once you’ve applied any filters you need, simply click the CSV Export button in the top left, and you’ll soon receive an email with a link to download your export CSV file.

Opening your CSV with a Python script

You’ll need to paste the filepath of your CSV export file into the script before running it, so that it accesses the correct file. Here’s how you can do that in Windows and Mac:

For Windows:

Locate the CSV file: Open the folder where you saved the CSV export file.

Copy the file path: Hold down the Shift key on your keyboard and right-click on the CSV file. In the context menu that appears, select "Copy as path". This will copy the full file path to your clipboard.

For macOS:

Locate the CSV file: Open the folder where you saved the CSV export file.

Copy the file path: Right-click (or Control-click) on the CSV file while holding down the Option key. In the context menu, choose "Copy <filename> as Pathname". This will copy the full file path to your clipboard.

So this part:

email_data = pd.read_csv('PASTE PATH TO YOUR CSV EXPORT HERE')

Needs to end up looking something like this:

email_data = pd.read_csv('/Users/YourName/Downloads/name-emailmeter00000.com.csv')

Don’t forget to leave the ‘ ‘ either side!

Exploring your data

Email Meter offers a ton of metrics on your report, without any need to run further scripts, but I wanted to explore where else we could take our Raw Data Exports. From discovering your busiest day of the year, to discovering which day of the week you work overtime the most, here’s the most interesting insights I could squeeze from my data:

1. Who do I CC the most in my emails?

Outside of just being a curiosity, this can help you identify someone who you frequently CC but who does not need to be included, so you can streamline your communication and stop cluttering their inbox.

To get started, copy the script below and paste it into the Python environment you set up.
Copy the file path to your Email Meter export as explained above, and then paste it between the quotation marks to replace PASTE PATH TO YOUR CSV EXPORT HERE. You'll need to do this for every script, so it can access your data.
Run the script to see your result!

import pandas as pd

email_data = pd.read_csv('PASTE PATH TO YOUR CSV EXPORT HERE')

cced_people = email_data['recipients_cc'].fillna('').str.split(', ')

cced_people = [email for sublist in cced_people for email in sublist if email != '' and email != 'scandarsilva@gmail.com']

cced_counts = pd.Series(cced_people).value_counts()

top_cced_people = cced_counts[:10]

print("The top 10 most frequently CC'ed people in your emails are:")
for person, count in top_cced_people.iteritems():
    print(f"{person}: {count} times")

2. When do I normally wake up, have lunch and go to sleep?

Here I wanted to see how closely my email data could reflect my real life. I would normally send an email soon after waking up, would take a break from sending emails in the middle of the day to eat, and would obviously not send any emails when I’m asleep! I would say it did a pretty good job accurately predicting this:

import pandas as pd
from datetime import datetime, timedelta

# Read data from CSV file
email_data = pd.read_csv('PASTE PATH TO YOUR CSV EXPORT HERE')

# Convert date to datetime
email_data['date'] = pd.to_datetime(email_data['date'])

# Convert timestamps to local time zone if necessary
# email_data['date'] = email_data['date'].dt.tz_convert('YOUR_TIMEZONE')

# Sort the emails by date
email_data = email_data.sort_values('date')

# Filter emails sent between 11:00 AM and 2:00 PM
lunchtime_emails = email_data[(email_data['date'].dt.hour >= 11) & (email_data['date'].dt.hour <= 14)]

# Calculate the lunchtime as the midpoint between the first and last lunchtime emails
first_lunchtime = lunchtime_emails['date'].min()
last_lunchtime = lunchtime_emails['date'].max()
lunchtime = first_lunchtime + (last_lunchtime - first_lunchtime) / 2

# Estimate waking, lunch, and bed times
first_email = email_data['date'].min()
last_email = email_data['date'].max()

waking_time = (first_email + (lunchtime - first_lunchtime)).time()
lunch_time = lunchtime.time()
bed_time = (last_email - (last_lunchtime - lunchtime)).time()

print(f"Your estimated waking time is {waking_time}, lunch time is {lunch_time}, and bed time is {bed_time}.")

3. Which of my contacts are Early Birds, Night Owls or Day…Birds?

By looking at when each contact is most likely to email you, we can deduce when other people are most active. This can help you manage your expectations when waiting for a reply. This gives you a percentage next to each contact (up to 10 for each) which tells you how many of their emails were received during the morning/day/night from them. The times are based on your own timezone, so someone who is showing up as a nightowl for you might actually be an early bird in their own country.

import pandas as pd
import matplotlib.pyplot as plt

# Read data from CSV file
email_data = pd.read_csv('/Users/scandarsilvapayne/Downloads/gino-emailmeter-com_4963281e-000000000000.csv')

# Convert date to datetime and get the hour part only
email_data['date'] = pd.to_datetime(email_data['date'])
email_data['hour'] = email_data['date'].dt.hour

# Filter out contacts you have never replied to
replied_contacts = email_data[email_data['is_replied'] == True]['from'].unique()

# Filter email data to include only contacts you have replied to
replied_email_data = email_data[email_data['from'].isin(replied_contacts)]

# Group emails by contacts and calculate the frequency of emails sent during different time ranges
contact_email_counts = replied_email_data.groupby('from')['hour'].value_counts().unstack(fill_value=0)

# Calculate the total count of emails sent by each contact
contact_total_counts = contact_email_counts.sum(axis=1)

# Calculate the percentage of emails sent during each time range for each contact
contact_email_percentages = contact_email_counts.divide(contact_total_counts, axis=0)

# Define the time ranges based on actual hours
time_ranges = {
    'Morning': range(6, 12),
    'Daytime': range(12, 18),
    'Night': [*range(18, 24), *range(0, 6)]
}

# Initialize lists to store the contacts for each time range
morning_contacts = []
daytime_contacts = []
night_contacts = []

# Iterate over each contact
for contact in contact_email_percentages.index:
    # Get the percentage of emails sent during each time range for the contact
    percentages = contact_email_percentages.loc[contact]
    # Identify the maximum percentage and its corresponding time range
    max_percentage = percentages.max()
    max_time_range = percentages[percentages == max_percentage].index[0]
    # Assign the contact to the respective time range list
    if max_time_range in time_ranges['Morning']:
        morning_contacts.append((contact, max_percentage))
    elif max_time_range in time_ranges['Daytime']:
        daytime_contacts.append((contact, max_percentage))
    elif max_time_range in time_ranges['Night']:
        night_contacts.append((contact, max_percentage))

# Sort the contacts based on the percentage in descending order
morning_contacts = sorted(morning_contacts, key=lambda x: x[1], reverse=True)[:10]
daytime_contacts = sorted(daytime_contacts, key=lambda x: x[1], reverse=True)[:10]
night_contacts = sorted(night_contacts, key=lambda x: x[1], reverse=True)[:10]

# Print the top morning, daytime, and nighttime repliers
print("Morning Repliers:")
for contact, percentage in morning_contacts:
    print(f"{contact} - {percentage:.2%}")

print("\nDaytime Repliers:")
for contact, percentage in daytime_contacts:
    print(f"{contact} - {percentage:.2%}")

print("\nNighttime Repliers:")
for contact, percentage in night_contacts:
    print(f"{contact} - {percentage:.2%}")

4. What day did I send/received the most emails EVER?

For this one, you’ll want a long period of time, at least a year or more, for it to be relevant. We’re trying to find that ONE day where we were either buried in emails or super busy sending them out. It will print the date of the busiest day for both. When you’ve found the date, try going to Email Meter and looking at it in a report—see anything interesting? What was special about that day?

import pandas as pd
import datetime

# Read data from CSV file
email_data = pd.read_csv('PASTE PATH TO YOUR CSV EXPORT HERE')

# Convert date to datetime and get the date part only
email_data['date'] = pd.to_datetime(email_data['date']).dt.date

# Task 1: Day with the most emails received
most_emails_day = email_data[email_data['type'] == 'RECEIVED']['date'].value_counts().idxmax()
print(f"The day you received the most emails ever is: {most_emails_day}")

# Task 2: Day with the most emails sent
most_emails_day = email_data[email_data['type'] == 'SENT']['date'].value_counts().idxmax()
print(f"The day you sent the most emails ever is: {most_emails_day}")

5. What is other people’s response time to ME?

Who’s that slowcoach who you’re always waiting on? It can be frustrating to wait for replies, so here’s a way to quantify that and see who could be holding you up. But don’t forget, unless your response time is in top shape, you’re probably someone else's slowcoach.

import pandas as pd
from datetime import datetime
import numpy as np

# Read data from CSV file
email_data = pd.read_csv('/Users/scandarsilvapayne/Downloads/gino-emailmeter-com_4963281e-000000000000.csv')

# Convert the date to datetime
df['date'] = pd.to_datetime(df['date'])

# Sort values by date and thread
df.sort_values(['gmail_thread_id', 'date'], inplace=True)

# Initialize a dictionary to store reply times for each recipient
reply_times = {}

# Group by thread and then apply the function to calculate reply times
for thread_id, group in df.groupby('gmail_thread_id'):
    sent_rows = group[group['type'] == 'SENT']
    received_rows = group[group['type'] == 'RECEIVED']

    for sent_index, sent_row in sent_rows.iterrows():
        # Get the recipients of the sent email
        recipients = sent_row['recipients_to'].split(';')

        # Find the time of the next received email from someone else
        next_received_rows = received_rows[(received_rows['date'] > sent_row['date']) & 
                                           (received_rows['from'] != 'scandarsilva@gmail.com')]

        if not next_received_rows.empty:
            next_received_row = next_received_rows.iloc[0]
            # Calculate the reply time in minutes
            reply_time = (next_received_row['date'] - sent_row['date']).total_seconds() / 60

            for recipient in recipients:
                if recipient in reply_times:
                    reply_times[recipient].append(reply_time)
                else:
                    reply_times[recipient] = [reply_time]

# Count how many emails you've sent to each contact
email_counts = df[df['type'] == 'SENT']['recipients_to'].value_counts()

# Get the top 20 contacts you've emailed the most
top_contacts = email_counts.nlargest(20).index

# Calculate and print the average reply time for each of the top contacts
for contact in top_contacts:
    if contact in reply_times and reply_times[contact]:
        avg_reply_time = np.mean(reply_times[contact])  # in minutes
        hours = avg_reply_time // 60
        minutes = avg_reply_time % 60
        print(f'Average reply time for {contact}: {int(hours)} hours and {int(minutes)} minutes')
    else:
        print(f'No reply times found for {contact}.')

This script gives you the response time for your top 20 contacts. If you want to see more contacts, change the number here and run it again:

top_contacts = email_counts.nlargest(20).index

6. What day do I send the most emails OUTSIDE of working hours?

Sadly, we often blur the lines between work time and personal time by sending emails outside of our set hours. This isn’t a huge deal if it’s only a few emails, but if it’s becoming overwhelming, you’ll want to do something about it. This script will tell you which day is the one where you send the most emails outside work hours, so you can try to address the root cause.

import pandas as pd

# Read data from CSV file
df = pd.read_csv('/Users/scandarsilvapayne/Downloads/gino-emailmeter-com_4963281e-000000000000.csv')

# Convert the date to datetime
df['date'] = pd.to_datetime(df['date'])

# Filter for sent emails only
df = df[df['type'] == 'SENT']

# Create a new column for the hour of the day the email was sent
df['hour'] = df['date'].dt.hour

# Create a new column for the day of the week the email was sent
df['day_of_week'] = df['date'].dt.day_name()

# Filter for emails sent outside of working hours
outside_working_hours = df[(df['hour'] < 9) | (df['hour'] >= 17)]

# Find the day of the week where most emails are sent outside of working hours
most_emails_day = outside_working_hours['day_of_week'].value_counts().idxmax()

# Calculate the average number of emails sent outside of working hours on the busiest day
average_emails_busiest_day = outside_working_hours[outside_working_hours['day_of_week'] == most_emails_day].shape[0] / outside_working_hours['day_of_week'].value_counts()[most_emails_day]

# Calculate the average number of emails sent outside of working hours on other days
average_emails_other_days = outside_working_hours[outside_working_hours['day_of_week'] != most_emails_day].shape[0] / outside_working_hours['day_of_week'].value_counts().sum()

print(f"The day of the week you send the most emails outside of working hours is: {most_emails_day}")
print(f"Average number of emails sent outside of working hours on {most_emails_day}: {average_emails_busiest_day}")
print(f"Average number of emails sent outside of working hours on other days: {average_emails_other_days}")

We’re assuming that your work hours are 9AM to 5PM here, you can adjust them by editing this line and rerunning the script:

outside_working_hours = df[(df['hour'] < 9) | (df['hour'] >= 17)]

This uses military time, which is why 5 PM is 17:00.

7. What is my star sign based on my email data?

Finally, a silly one! It actually got it completely wrong for me—it guessed Cancer, Scorpio or Pisces, but I'm an Aquarius. However, it was still funny to see what Chat-GPT considered traits for each sign. I asked it for an explanation and it told me:

“The chosen time ranges for each star sign in the script are based on general astrological interpretations and assumptions. Here's a summary of why each time range was associated with specific star signs:

Night: Aries, Leo, and Sagittarius are active and dynamic signs, and the night time range reflects their energetic nature.

Morning: Gemini, Libra, and Aquarius are sociable signs, and the morning time range aligns with their talkative and curious nature.

Afternoon: Cancer, Scorpio, and Pisces are introspective signs, and the afternoon time range represents a time of productivity and depth.

Evening: Taurus, Virgo, and Capricorn are practical signs, and the evening time range reflects their focus, organization, and goal-oriented nature”.

See if it gets yours right with this script:

import pandas as pd

# Read data from CSV file
email_data = pd.read_csv('/Users/scandarsilvapayne/Downloads/gino-emailmeter-com_4963281e-000000000000.csv')

# Define time ranges and associated star signs
time_ranges = {
    'night': ['Aries', 'Leo', 'Sagittarius'],
    'morning': ['Gemini', 'Libra', 'Aquarius'],
    'afternoon': ['Cancer', 'Scorpio', 'Pisces'],
    'evening': ['Taurus', 'Virgo', 'Capricorn']
}

# Calculate the frequency of emails sent during each time range for each individual
email_data['timestamp'] = pd.to_datetime(email_data['date'])
email_data['time_range'] = pd.cut(email_data['timestamp'].dt.hour,
                                  bins=[0, 6, 12, 18, 24],
                                  labels=['night', 'morning', 'afternoon', 'evening'])

email_counts = email_data.groupby(['address', 'time_range']).size().reset_index(name='email_count')

# Determine the dominant time range for each individual
dominant_time_ranges = email_counts.groupby('address')['email_count'].idxmax()
dominant_time_ranges = email_counts.loc[dominant_time_ranges, ['address', 'time_range']]

# Make assumptions about star signs based on dominant time ranges
star_signs = []
for address, time_range in dominant_time_ranges.values:
    potential_star_signs = time_ranges[time_range]
    star_signs.append((address, potential_star_signs))

# Print the star sign assumptions
for address, potential_star_signs in star_signs:
    print(f"Address: {address}, Potential Star Signs: {', '.join(potential_star_signs)}")

Don’t forget that you can edit any of these scripts, simply by pasting them into an AI chatbot and asking it for the changes you want. It might take some trial and error, but it usually gets it working in the end.

We’d love to hear of any other scripts you were able to create with your Email Meter data, or any other data for that matter!

Email Meter Blog

Your Email, Your Story: Exploring Data Magic with AI and Python

But wait, what the hell is Python?

Setting up your Python Environment

pip install jupyter

jupyter notebook

Pulling your email data

Opening your CSV with a Python script

For Windows:

For macOS:

email_data = pd.read_csv('PASTE PATH TO YOUR CSV EXPORT HERE')

email_data = pd.read_csv('/Users/YourName/Downloads/name-emailmeter00000.com.csv')

Exploring your data

1. Who do I CC the most in my emails?

2. When do I normally wake up, have lunch and go to sleep?

3. Which of my contacts are Early Birds, Night Owls or Day…Birds?

4. What day did I send/received the most emails EVER?

5. What is other people’s response time to ME?

top_contacts = email_counts.nlargest(20).index

6. What day do I send the most emails OUTSIDE of working hours?

outside_working_hours = df[(df['hour'] < 9) | (df['hour'] >= 17)]

7. What is my star sign based on my email data?

Take your team's management to the next level with email statistics

Request more information about our BigQuery integration

Latest

Gmail Inbox Analytics: How to Understand Your Email Habits and Performance

Google Workspace vs Office 365: Which Is Better for Customer Support Teams?

Email Meter Blog

Your Email, Your Story: Exploring Data Magic with AI and Python

But wait, what the hell is Python?

Setting up your Python Environment

pip install jupyter

jupyter notebook

Pulling your email data

Opening your CSV with a Python script

For Windows:

For macOS:

email_data = pd.read_csv('PASTE PATH TO YOUR CSV EXPORT HERE')

email_data = pd.read_csv('/Users/YourName/Downloads/name-emailmeter00000.com.csv')

Exploring your data

1. Who do I CC the most in my emails?

2. When do I normally wake up, have lunch and go to sleep?

3. Which of my contacts are Early Birds, Night Owls or Day…Birds?

4. What day did I send/received the most emails EVER?

5. What is other people’s response time to ME?

top_contacts = email_counts.nlargest(20).index

6. What day do I send the most emails OUTSIDE of working hours?

outside_working_hours = df[(df['hour'] < 9) | (df['hour'] >= 17)]

7. What is my star sign based on my email data?

Take your team's management to the next level with email statistics

Request more information about our BigQuery integration

Latest

Gmail Inbox Analytics: How to Understand Your Email Habits and Performance

Google Workspace vs Office 365: Which Is Better for Customer Support Teams?

From basic reporting to strategic intelligence with Custom Dashboards

Automate email quality reviews instantly with AI