Null correction is major challenge while developing any model in data science. It will not only alter your model prediction and effect your accuracy on prediction but also waste a lot of time while performing EDA.
pybaseanal will ease your task by just one line of code. Please follow the following link to use pybaseanal for basic null analysis.
1. Install the pybaseanal library
pip install pybaseanal
2. Import the library
from pybaseanal import *
Output >>
Imported pandas as pd, numpy as np, seaborn as sns, matplotlib.pyplot as plt
if you import pybaseanal , please note that you need not to import pandas, numpy, seaborn and matplotlib
3. Define the dataframe
df = pd.read_csv("your_df_url")
4. Perform the Null analysis
null_Analysis(df, null_Cuttoff)
For null analysis you need to define 2 parameters. First one is dataframe and second is null_cuttoff , which represents the value of null value above which and below which you need to separate the list. For example if I want to see the columns above and below 25% of null value, my null_Cuttoff will be 25.
By default the null_cuttoff is 25
Output >>
Total Data Size 9240 and total number of columns are 37
Columns with more than 25% of null value : ['Lead Quality', 'Asymmetrique Activity Index', 'Asymmetrique Profile Score', 'Asymmetrique Activity Score', 'Asymmetrique Profile Index', 'Tags', 'Lead Profile', 'What matters most to you in choosing a course', 'What is your current occupation', 'Country']
Columns with less than 25% of null value : ['How did you hear about X Education', 'Specialization', 'City', 'Page Views Per Visit', 'TotalVisits', 'Last Activity', 'Lead Source']
*************************
*** Dataframe Summary ***
*************************
Feature : Lead Quality || Data Type : object
No. of Null Value : 4767 || Total Non-Null Value : 4473 || % NUll Value : 51.590909090909086
===========
Feature : Asymmetrique Activity Index || Data Type : object
No. of Null Value : 4218 || Total Non-Null Value : 5022 || % NUll Value : 45.64935064935065
===========
Feature : Asymmetrique Profile Score || Data Type : float64
No. of Null Value : 4218 || Total Non-Null Value : 5022 || % NUll Value : 45.64935064935065
===========
Feature : Asymmetrique Activity Score || Data Type : float64
No. of Null Value : 4218 || Total Non-Null Value : 5022 || % NUll Value : 45.64935064935065
===========
Feature : Asymmetrique Profile Index || Data Type : object
No. of Null Value : 4218 || Total Non-Null Value : 5022 || % NUll Value : 45.64935064935065
===========
Feature : Tags || Data Type : object
No. of Null Value : 3353 || Total Non-Null Value : 5887 || % NUll Value : 36.28787878787879
===========
Feature : Lead Profile || Data Type : object
No. of Null Value : 2709 || Total Non-Null Value : 6531 || % NUll Value : 29.318181818181817
===========
Feature : What matters most to you in choosing a course || Data Type : object
No. of Null Value : 2709 || Total Non-Null Value : 6531 || % NUll Value : 29.318181818181817
===========
Feature : What is your current occupation || Data Type : object
No. of Null Value : 2690 || Total Non-Null Value : 6550 || % NUll Value : 29.11255411255411
===========
Feature : Country || Data Type : object
No. of Null Value : 2461 || Total Non-Null Value : 6779 || % NUll Value : 26.634199134199132
===========
Feature : How did you hear about X Education || Data Type : object
No. of Null Value : 2207 || Total Non-Null Value : 7033 || % NUll Value : 23.885281385281385
===========
Feature : Specialization || Data Type : object
No. of Null Value : 1438 || Total Non-Null Value : 7802 || % NUll Value : 15.562770562770561
===========
Feature : City || Data Type : object
No. of Null Value : 1420 || Total Non-Null Value : 7820 || % NUll Value : 15.367965367965366
===========
Feature : Page Views Per Visit || Data Type : float64
No. of Null Value : 137 || Total Non-Null Value : 9103 || % NUll Value : 1.4826839826839828
===========
Feature : TotalVisits || Data Type : float64
No. of Null Value : 137 || Total Non-Null Value : 9103 || % NUll Value : 1.4826839826839828
===========
Feature : Last Activity || Data Type : object
No. of Null Value : 103 || Total Non-Null Value : 9137 || % NUll Value : 1.1147186147186148
===========
Feature : Lead Source || Data Type : object
No. of Null Value : 36 || Total Non-Null Value : 9204 || % NUll Value : 0.38961038961038963
===========
Feature : Receive More Updates About Our Courses || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : I agree to pay the amount through cheque || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Get updates on DM Content || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Update me on Supply Chain Content || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : A free copy of Mastering The Interview || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Prospect ID || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Newspaper Article || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Through Recommendations || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Digital Advertisement || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Newspaper || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : X Education Forums || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Lead Number || Data Type : int64
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Magazine || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Search || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Total Time Spent on Website || Data Type : int64
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Converted || Data Type : int64
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Do Not Call || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Do Not Email || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Lead Origin || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Feature : Last Notable Activity || Data Type : object
No. of Null Value : 0 || Total Non-Null Value : 9240 || % NUll Value : 0.0
===========
Please drop comment if you like to add some more feature or like to suggest something within this. you a also write to pybaseanal@phaf.in
Comments