Earlier in the series, I covered how to load data from a csv file into a pandas DataFrame. In this post I’ll focus on the building block for all other DataFrame queries: SELECT * FROM DataFrame
Also in this series:
- Creating a DataFrame (CREATE TABLE)
- Populating a DataFrame (INSERT)
- How to load a CSV file into a Pandas DataFrame (BULK INSERT)
- Handling Nulls read from CSV
- Join statements
- Select all rows and columns (SELECT * FROM table)
- Select multiple columns (SELECT col1,col2 FROM table)
So you have your data loaded into a Pandas DataFrame and now you want to run a query against the DataFrame.
There are different methods we can use depending on what we want to do. Let’s start with loc and iloc
- loc allows you to specify which columns you want by providing the column name
- iloc allows you to specify which columns you want be providing the column position
Imagine you have a DataFrame called airports with the following values:
index | airport_code | city | state | capacity |
---|---|---|---|---|
0 | SEA | Seattle | WA | 45000000 |
1 | BOS | Boston | MA | 36000000 |
2 | HOU | Houston | TX | 55000000 |
3 | YOW | Ottawa | ON | 25000000 |
4 | YYZ | Toronto | ON | 65000000 |
The first query you need is the same one you learn when you first learn SQL
SELECT * FROM airports
All the commands below will return the same result
Just want all the rows and columns? If you are using a Jupyter notebook just type the DataFrame name
airports
To display the values as output from any Python code
print(airports)
loc, and iloc accept parameters to specify which rows and columns to return. A value of : indicates all rows, or all columns.
airports.loc[:,:]
airports.iloc[:,:]
If you do not specify a value for the columns parameter, loc and iloc return all columns.
airports.loc[:]
airports.iloc[:]
If you know how many rows are in the DataFrame, you could specify you want rows 0 through to the last row in the DataFrame.
airports.loc[0:4,:]
airports.loc[0:4]
airports.iloc[0:4,:]
airports.iloc[0:4]
If you don’t know the number of rows in the DataFrame or the number of rows might change, use len to return the number of rows and make your code more robust.
airports.loc[0:len(airports),:]
airports.loc[0:len(airports)]
airports.iloc[0:len(airports),:]
airports.iloc[0:len(airports)
Every command above returns the exact same output:
As is so often the case when writing code, there are lots of different ways to accomplish the same task. Getting familiar with these different techniques will give you a good base to build on as me to more advanced queries.
Looking for similar topics and technical posts? See the full list
Posted by Pandas for SQL Lovers – SELECT col1,col2 FROM Table | HockeyGeekGirl on January 13, 2020 at 4:46 PM
[…] Select all rows and columns (SELECT * FROM table) […]
Posted by Pandas for SQL lovers – JOIN statements | HockeyGeekGirl on January 13, 2020 at 4:52 PM
[…] Select all rows and columns (SELECT * FROM table) […]
Posted by Pandas for SQL Lovers: Handling Nulls read from CSV | HockeyGeekGirl on January 13, 2020 at 4:53 PM
[…] Select all rows and columns (SELECT * FROM table) […]
Posted by Pandas for SQL lovers Reading a CSV file / BULK INSERT | HockeyGeekGirl on January 13, 2020 at 4:54 PM
[…] Select all rows and columns (SELECT * FROM table) […]
Posted by Pandas for SQL Lovers INSERT / Populating a DataFrame | HockeyGeekGirl on January 13, 2020 at 4:54 PM
[…] Select all rows and columns (SELECT * FROM table) […]
Posted by Python Pandas for SQL fans: Creating DataFrames | HockeyGeekGirl on January 13, 2020 at 4:56 PM
[…] Select all rows and columns (SELECT * FROM table) […]