Extracting Replies in Enron Email Dataset

Solution for Extracting Replies in Enron Email Dataset
is Given Below:

I am working on a project to extract emails from the Enron dataset but because of how many different formats there are, it is proving to be quite difficult. I have parsed through the entire dataset and pulled different metadata items such as From, To, Subject, Body, as well as X-From and X-To. I am trying to figure out if an employee replied to another person’s emails and if so what the other person’s email was. Here is an example of an easy to parse email:

Message-ID: <18883955.1075841907648.JavaMail[email protected]>
Date: Mon, 23 Apr 2001 06:12:00 -0700 (PDT)
From: [email protected]
To: [email protected]
Subject: Re: 588194
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-From: Kate Symes
X-To: Sharen Cason
X-cc: 
X-bcc: 
X-Folder: kate symes 6-27-02Notes FoldersSent
X-Origin: SYMES-K
X-FileName: kate symes 6-27-02.nsf

It's been changed to CAISO energy.

Thanks,
Kate


   
    
    
    From:  Sharen Cason                           04/23/2001 01:18 PM
    

To: Kate Symes/PDX/[email protected]
cc: Kimberly Hundl/Corp/[email protected], Amy Smith/[email protected] 

Subject: 588194

This deal is in as firm energy with an SP delivery point.

Thanks!

Because there are only two people involved in this thread, and I can get the names of the people from X-From and X-To, I can split the body into two parts by finding Sharen Cason in the string. I can then easily extract the body from Sharen’s reply. However, things become much trickier when more people are involved. For example, take a look at this chain:

Message-ID: <[email protected]>
Date: Mon, 30 Apr 2001 02:33:00 -0700 (PDT)
From: [email protected]
To: [email protected]
Subject: Re: New Broker: Automated Power Exchange
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
X-From: Kate Symes
X-To: Rhonda L Denton
X-cc: 
X-bcc: 
X-Folder: kate symes 6-27-02Notes FoldersSent
X-Origin: SYMES-K
X-FileName: kate symes 6-27-02.nsf

Did Chris Foster ever get back to you on this? I had asked him to let you 
know when he found out, but I wanted to make sure the information actually 
got passed along. Let me know when you get a moment.

Thanks,
Kate




Rhonda L Denton
04/20/2001 06:16 AM
To: Kate Symes/PDX/[email protected], Melissa Ann Murphy/HOU/[email protected]
cc:  

Subject: Re: New Broker: Automated Power Exchange

FYI.  Please let us know if we will be receiving broker paper in Houston or 
will it come to Portland?
---------------------- Forwarded by Rhonda L Denton/HOU/ECT on 04/20/2001 
08:15 AM ---------------------------


Jason Moore
04/20/2001 08:10 AM
To: Rhonda L Denton/HOU/[email protected]
cc:  
Subject: Re: New Broker: Automated Power Exchange  

The counterparty has been linked to the Power Broker - Houston Group.

Jason Moore





Rhonda L Denton
04/19/2001 04:16 PM
To: Jason Moore/HOU/[email protected]
cc:  
Subject: New Broker: Automated Power Exchange

Please link this CP to Enpower with group link of Power Broker-Houston.  
Please let me know when completed.  Thanks.
---------------------- Forwarded by Rhonda L Denton/HOU/ECT on 04/19/2001 
04:15 PM ---------------------------
   Kate Symes                04/19/2001 03:25 PM

To: Melissa Ann Murphy/HOU/[email protected], Rhonda L Denton/HOU/[email protected]
cc:  
Subject: New Broker: Automated Power Exchange

P.S. I mentioned this to you, Melissa. But just a reminder that we need the 
default broker fee on this to be $.03.

Thank you!
---------------------- Forwarded by Kate Symes/PDX/ECT on 04/19/2001 01:32 PM 
---------------------------
   Kate Symes                04/19/2001 01:20 PM

To: Melissa Ann Murphy/HOU/[email protected], Rhonda L Denton/HOU/[email protected]
cc:  

Subject: New Broker: Automated Power Exchange

Chris Foster has requested that we set up a new broker - the Automated Power 
Exchange, in EnPower. It is the same counterparty we already have in EnPower, 
with the same name and contact information. Melissa, I believe you said we 
could pretty easily add the name in the broker list if it was the same as in 
the counterparty list, which Chris has verified it is. Please let me know if 
you need anymore information.

Thanks,
Kate

The people of interest here are Kate Symes and Rhonda L Denton, but I am not sure how to weed through all the forwarded junk and just pull their respective emails. If anyone could propose some logic to perform this task, it would be great.