pandas-dataframe
python-pandas-dataframe
pandas-what-is-dataframe-explained
Pandas DataFrame is two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. Pandas DataFrame consists of three principal components, the data, rows, and columns.
We will get a brief insight on all these basic operation which can be performed on Pandas DataFrame :
Creating a DataFrame
Dealing with Rows and Columns
Indexing and Selecting Data
Working with Missing Data
Iterating over rows and columns
In the real world, a Pandas DataFrame will be created by loading the datasets from existing storage, storage can be SQL Database, CSV file, and Excel file. Pandas DataFrame can be created from the lists, dictionary, and from a list of dictionary etc. Dataframe can be created in different ways here are some ways by which we create a dataframe:
Creating a dataframe using List: DataFrame can be created using a single list or a list of lists.
import pandas as pd
import pandas as pd
list of strings
lst = ['Geeks', 'For', 'Geeks', 'is',
'portal', 'for', 'Geeks']
Calling DataFrame constructor on list
df = pd.DataFrame(lst)
print(df)
Run on IDE
Output:
Creating DataFrame from dict of ndarray/lists: To create DataFrame from dict of narray/list, all the narray must be of same length. If index is passed then the length index should be equal to the length of arrays. If no index is passed, then by default, index will be range(n) where n is the array length.
Python code demonstrate creating
DataFrame from dict narray / lists
By default addresses.
import pandas as pd
intialise data of lists.
data = {'Name':['Tom', 'nick', 'krish', 'jack'],
'Age':[20, 21, 19, 18]}
Create DataFrame
df = pd.DataFrame(data)
Print the output.
print(df)
Run on IDE
Output:
A Data frame is a two-dimensional data structure, i.e., data is aligned in a tabular fashion in rows and columns. We can perform basic operations on rows/columns like selecting, deleting, adding, and renaming.
Column Selection: In Order to select a column in Pandas DataFrame, we can either access the columns by calling them by their columns name.
Import pandas package
import pandas as pd
Define a dictionary containing employee data
data = {'Name':['Jai', 'Princi', 'Gaurav', 'Anuj'],
'Age':[27, 24, 22, 32],
'Address':['Delhi', 'Kanpur', 'Allahabad', 'Kannauj'],
'Qualification':['Msc', 'MA', 'MCA', 'Phd']}
Convert the dictionary into DataFrame
df = pd.DataFrame(data)
select two columns
print(df[['Name', 'Qualification']])
Row Selection: Pandas provide a unique method to retrieve rows from a Data frame. DataFrame.loc[] method is used to retrieve rows from Pandas DataFrame. Rows can also be selected by passing integer location to an iloc[] function.
Note: We’ll be using nba.csv file in below examples.
# importing pandas package
import pandas as pd
# making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
# retrieving row by loc method
first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]
print(first, "\n\n\n", second)
Output:
As shown in the output image, two series were returned since there was only one parameter both of the times.
Indexing in pandas means simply selecting particular rows and columns of data from a DataFrame. Indexing could mean selecting all the rows and some of the columns, some of the rows and all of the columns, or some of each of the rows and columns. Indexing can also be known as Subset Selection.
Indexing a Dataframe using indexing operator [] :
Indexing operator is used to refer to the square brackets following an object. The .loc and .iloc indexers also use the indexing operator to make selections. In this indexing operator to refer to df[].
In order to select a single column, we simply put the name of the column in-between the brackets
importing pandas package
import pandas as pd
making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
retrieving columns by indexing operator
first = data["Age"]
print(first)
Indexing a DataFrame using .loc[ ] :
This function selects data by the label of the rows and columns. The df.loc indexer selects data in a different way than just the indexing operator. It can select subsets of rows or columns. It can also simultaneously select subsets of rows and columns.
Selecting a single row
In order to select a single row using .loc[], we put a single row label in a .loc function.
importing pandas package
import pandas as pd
making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
retrieving row by loc method
first = data.loc["Avery Bradley"]
second = data.loc["R.J. Hunter"]
print(first, "\n\n\n", second)
Output:
As shown in the output image, two series were returned since there was only one parameter both of the times.
Indexing a DataFrame using .iloc[ ] :
This function allows us to retrieve rows and columns by position. In order to do that, we’ll need to specify the positions of the rows that we want, and the positions of the columns that we want as well. The df.iloc indexer is very similar to df.loc but only uses integer locations to make its selections.
Selecting a single row
In order to select a single row using .iloc[], we can pass a single integer to .iloc[] function.
import pandas as pd
making data frame from csv file
data = pd.read_csv("nba.csv", index_col ="Name")
retrieving rows by iloc method
row2 = data.iloc[3]
print(row2)
Output:
Missing Data can occur when no information is provided for one or more items or for a whole unit. Missing Data is a very big problem in real life scenario. Missing Data can also refer to as NA(Not Available) values in pandas.
Checking for missing values using isnull() and notnull() :
In order to check missing values in Pandas DataFrame, we use a function isnull() and notnull(). Both function help in checking whether a value is NaN or not. These function can also be used in Pandas Series in order to find null values in a series.
importing pandas as pd
import pandas as pd
importing numpy as np
import numpy as np
dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
'Second Score': [30, 45, 56, np.nan],
'Third Score':[np.nan, 40, 80, 98]}
creating a dataframe from list
df = pd.DataFrame(dict)
using isnull() function
df.isnull()
Output:
convert list to pandas table
df=pd.DataFrame({'Job_title':job_title,'Job_location':job_location,'Company_name':company_name,'Experience':exp_Reqd})
import pandas as pd
# Create a list of elements
data_list = [
["Alice", 25],
["Bob", 30],
["Charlie", 35],
["David", 40]
]
# Convert the list into a Pandas DataFrame
df = pd.DataFrame(data_list, columns=["Name", "Age"])
# Display the DataFrame
print(df)
Top comments (0)