Reading CSVs With Pandas – Real Python (original) (raw)
This lesson covers a couple different ways to import CSV data into the third party Pandas library. In this video, you’ll learn how to install pandas using pip and, how to use it to read CSV files.
Here’s the example CSV file you’ll be using (hrdata.csv
):
The following example shows how to read a CSV file and print out its contents using pandas:
In addition to learning how to read CSV files and printing their contents, you will see how to use pandas to modify the index on the files you read, parse dates and also how to add headers to CSV files without one.
00:00 One way to work with CSVs in Python is to use the data analysis library Pandas, short for panel data. It’s sometimes referred to as the Excel of Python as it stores data in DataFrames, which can be thought of as Excel spreadsheets.
00:13 Before you can do anything with Pandas, you need to install it, so go to your terminal and put in pip install pandas,
00:21 and if you don’t already have it, that should go ahead and get it for you.
00:26 So now in your editor, you can import pandas as pd, and let’s take a look at the data that we’re going to be working with. Over here, I have a file called hrdata.csv, and it just has the names, hire dates, salary, and sick days remaining for a number of employees. To load this into pandas, just go back, create a DataFrame that you can just call df, set that equal to pd.read_csv(), pass in the filename, 'hrdata.csv', and you can print that out just by calling a print() on the DataFrame.
01:05 Just try running that, and there you go! You can see that everything imported. Looks like there was an issue here. Let’s go back to the CSV, and it looks like I put a period instead of a comma there. So we’ll save that. And just to be safe, let’s rerun it.
01:23 There we go. So, pandas went ahead and looked at the first row so it knew what the header titles were, and it took all of the numerical data and turned those into numbers.
01:32 There are two things that we could improve here. If you’ll notice, the index is just a zero index, it’s pretty arbitrary. And maybe we want to use the Name column to be the index for this data.
01:44 So, this is pretty straightforward to fix. Just go back here when you read the CSV and add in a parameter called index_col and just set this equal to 'Name', just like that. Alrighty. Let’s rerun that.
02:00 Cool. And now you can see the Name has kind of been brought over further, and the data actually starts with the Hire Date, Salary, and Sick Days Remaining.
02:08 The other problem is that even though the Salary and Sick Days Remaining were converted to numerical data, the Hire Date is still stored as a string here.
02:17 And you can see this if you do a print(type()) on the df—we’ll just pull out the 'Hire Date' column
02:26 and index the first item off of there.
02:30 And I typed 'Data' instead of 'Date'. So, I’ll save that. There we go. And you can see that we have a string here for the date.
02:40 Fortunately, pandas has us covered here as well. We just have to pass in another parameter, we can say parse_dates—and because you may have multiples in here, we’ll pass in a list and just say ['Hire Date']. Okay, let’s try to rerun that.
03:00 And you can see the formatting change—now these are dashes instead of the slashes, and it’s year-month-day, so that’s good. And our type here now is a pandas Timestamp.
03:10 Now, let’s say the CSV did not have that header row. You can go ahead and add that when you read in the CSV, and you just have to make a couple changes here—so, I’ll actually bring these down
03:22 to make this a little easier to read. Because this one already has header information, you can pass in header=0 to ignore it, and we’ll add our own in. And just say names and we’ll pass in a list that’ll just be ['Employee', 'Hired', 'Salary', 'Sick Days'].
03:53 Now, because these will become the new headers, we need to change these to make sure they match up. So 'Name' is now 'Employee', and 'Hire Date' is now 'Hired'.
04:03 So, save that—oh, we would probably change this one too. Actually, let’s just get rid of that line. Save that, rerun it. Then you can see you have the new header information here. And that’s it!
04:17 That’s a couple different ways to import CSV data into pandas. In the next video, we’ll talk about how to write CSVs using pandas.