Pandas for SQL Lovers – SELECT col1,col2 FROM Table

In the last post we learned a variety of ways to stwo pandas and a heart with the word SQLelect all rows and all columns from a Pandas DataFrame. In this post we will learn how to return a specified list or range of columns from a DataFrame.

Also in this series:

Once again we will rely heavily on

  • loc return specified rows and columns from a DataFrame by specifying column names
  • iloc return specified rows and columns from a DataFrame by specifying column positions

Let’s use a DataFrame called airports with the following values:

index airport_code city state capacity
0 SEA Seattle WA 45000000
1 BOS Boston MA 36000000
2 HOU Houston TX 55000000
3 YOW Ottawa ON 25000000
4 YYZ Toronto ON 65000000

Let’s start by selecting a single column

SELECT city FROM airports

All the commands below will return the same result

airports['city']
airports.city

loc, and iloc accept parameters to specify which rows and columns to return. A value of : indicates al rows or all columns. You can specify : to return all rows, and the column name or position to specify the column you want to return.

airports.loc[:,'city']

If you use iloc, you specify the index position of the column instead of the column name. The index column is not counted as a column and the first column is column 0.

index airport_code city state capacity
Column index 0 1 2 3
0 SEA Seattle WA 45000000
1 BOS Boston MA 36000000
2 HOU Houston TX 55000000
3 YOW Ottawa ON 25000000
4 YYZ Toronto ON 65000000

If we want to display the city we need to request column 1.

airports.iloc[:,1]

iloc also allows you to use negative numbers to count from the end

index airport_code city state capacity
Reverse Column index -4 -3 -2 -1
0 SEA Seattle WA 45000000
1 BOS Boston MA 36000000
2 HOU Houston TX 55000000
3 YOW Ottawa ON 25000000
4 YYZ Toronto ON 65000000

We can return the city column by requesting column -3

airports.iloc[:,-3]

Any of the commands above return the same output

SELECTcityFROMairports

What if you want to select multiple columns?

SELECT airport_code, capacity FROM airports

You can specify the list of columns to return

airports[['airport_code','capacity']]
airports.loc[:,['airport_code','capacity']]
airports.iloc[:,[0,3]]
airports.iloc[:,[-4,-1]]

Any of the commands above return the following output:

SELECTairportcodecapacityFROMairports

SELECT RangeOfColumns FROM airports

iloc provides the ability to return a range of columns

You can specify a range using the syntax lowerBound: upperBound.

airports.iloc[:,0:2]

returns

SELECTfirsttwocolumnsFROMAirports

Wait a second, let’s take another look at our original DataFrame. We said column 0 was the airport code and column 2 was the state. Shouldn’t a range of 0:2 return airport_code, city, and state?

index airport_code city state capacity
Column index 0 1 2 3
0 SEA Seattle WA 45000000
1 BOS Boston MA 36000000
2 HOU Houston TX 55000000
3 YOW Ottawa ON 25000000
4 YYZ Toronto ON 65000000

When you specify a range the lower bound value is inclusive, the upper bound value is exclusive so it returns rows where:

lowerBound<= Column Index position < upperBound

Which means a range of 0 : 2 will return columns with an index of 0 and 1. For our DataFrame that returns airport_code, and city.

If you specify

airports.iloc[:,1:3]

You get back columns 1 & 2: city and state

SELECTColumsn1To3

If you want to return all the columns you would request columns 0 through 4 (even though there is no column 4 in our DataFrame)

airports.iloc[:,0:4]

SELECTColumns1to8

As a matter of fact you could specify a range of 0 to 8 and it would not give you an error, you would see the same output, even though our DataFrame only contains four columns

airports.iloc[:,0:8]

If this really annoys your inner coder (I want my code to be robust! I want to know how many columns are in the DataFrame, I should never specify a column index that does not exist, that should blow up my code!) You can look up the number of columns in your DataFrame and use that as a parameter for the Upper Bound

len(airports.columns)

Returns the number of columns in the DataFrame

airports.shape

Returns the number of rows and columns

Since we only want the number of columns, we would request the second parameter, since parameter indexing starts at 0. Parameter 0 is the number of rows, Parameter 1 is the number of columns.

airports.shape(1)

Returns the number of columns in the airports DataFrame

We can use these expressions to dynamically return the upper boundary.

airports.iloc[:,0:airports.shape[1]]
airports.iloc[:,0:len(airports.columns)]

Either of these two commands will return all the columns in our DataFrame

SELECTColumns1to8

You can also use negative numbers to specify a range. I have added the corresponding negative column numbers starting from the end of the DataFrame to try and help:

index airport_code city state capacity
Reverse Column index -4 -3 -2 -1
0 SEA Seattle WA 45000000
1 BOS Boston MA 36000000
2 HOU Houston TX 55000000
3 YOW Ottawa ON 25000000
4 YYZ Toronto ON 65000000
airports.iloc[:,-3:-1]

follows the same upperBound and lowerBound rules for exclusivity

LowerBound <= column index < UpperBound

-3 <= columnIndex < -1

So we return columns -3 and -2: city and state.

SELECTNegativeRange

 

Selecting multiple column ranges

Are you lost yet? If so, don’t panic, I gave you a lot of options, find the one that works for you! For those who are still keeping up here’s another cool trick you can do with iloc.

Try this!

airports.iloc[:,[i for i in range (1,3)]]

It might help if you look at what is returned for

for i in range (1,3):
print(i)

This returns

1

2

Therefore

airports.iloc[:,[i for i in range (1,3)]]

will return columns 1 and 2: city and state

SELECTNegativeRange

Why is this useful? Why didn’t I just ask for columns 1 and 2?

Imagine our airports DataFrame had 50 columns and you want columns 45-49 you could do this:

airports.iloc[:,[i for i in range (45,50)]]

But wait, we just saw that we can specify a range of columns so isn’t this simpler?

airports.iloc[:,45:50]

Yes, it is, and that will absolutely work, but what if you want column 0, column 1 AND columns 45-59! Using range you can do this:

airports.iloc[:,[0,1]+[i for i in range (45,49)]]

OR if you want columns 10 through 20 and columns 45-49

airports.iloc[:,[i for i in range(10,21)]+[i for i in range (45,50)]]

Using the range allows you to select two or more ranges of columns from a DataFrame.

airports.iloc[:,[i for i in range(10,21)]+[i for i in range (31,39)]+[i for i in range (45,50)]]

Would return columns 10 through 20, columns 31- 38 and columns 45-49!

You may not need this trick today, but now you know where to find it when you need it!

Advertisement

8 responses to this post.

  1. […] Select multiple columns (SELECT col1,col2 FROM table) […]

    Reply

  2. […] Select multiple columns (SELECT col1,col2 FROM table) […]

    Reply

  3. […] Select multiple columns (SELECT col1,col2 FROM table) […]

    Reply

  4. […] Select multiple columns (SELECT col1,col2 FROM table) […]

    Reply

  5. […] Select multiple columns (SELECT col1,col2 FROM table) […]

    Reply

  6. […] Select multiple columns (SELECT col1,col2 FROM table) […]

    Reply

  7. Posted by Aamir Ali on January 14, 2020 at 10:52 PM

    Hi thanks for getting back to me, I saw your lecturs on Alison in python, Please tell me much about python,where I learn python advance in real data,

    Regards, Aamir Ali 03003281853 WhatsApp: +923003281853

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: