1. Overview¶
1.1. Enron company¶
In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. In the resulting Federal investigation, a significant amount of typically confidential information entered into the public record, including tens of thousands of emails and detailed financial data for top executives.
1.2. Objective¶
I will play detective and build a classification algorithm to predict a person of interest identifier (POI) based on email and financial features in the combined dataset. A POI is anyone who has been indicted, settled without admitting the guilt and testified in exchange for immunity. We will check our predicted POI against actual POI in the dataset to evaluate our prediction.
This classification algorithm composed into several part: * Database exploration * Feature selection and optimization * Select and tune algorithm * Validate and Evaluate result
1.3. Dependencies¶
Additional Python packages used for this project:
pandas
sklearn
matplotlib
numpy
pickle
1.4. Resources¶
Website used to implement this project: