Topic name — Top 5 libraries used in python for machine learning.

Akrit Awasthi
6 min readOct 6, 2021

Introduction

In this blog i am going to discuss about top 5 machine learning library in python- Numpy, Pandas, Matplotlib, SciKit-Learn and NLTK

Python library is collection of functions and methods that allows you to perform many actions without writing your code.

1)NumPy-NumPy stands for numeric python and is the core library for numeric and scientific Computing, it consists of multidimensional array and collection of routines for processing those arrays.

Anatomy of an Array:

Array:- Array is collection of similar type of data.

Single dimensional array

Which has one dimension — Ex- array([23, 45, 56, 67, 78])

Multidimensional array

Array of array, 2d array

Ex- array([[20, 30, 40, 50],[34, 45, 56, 67]])

Creating array in python

Array using numpy in python

here first we have imported numpy

then create array using .array() and store it in num1 variable

If you want to know the type of array then simply you can use type(“variableName”)

To know the dimension of the array you can simply use variableName.ndim

To know the datatype use variableName.dtype

Accessing element in an array

for accessing element in an array using index number

Accessing element in an array

you can see here in above screen shot in our array 67 is on 3rd index

similar if i want to access 23 in my array the i use num1[0]

2)Pandas- Pandas stands for panel data and is the core library for data manipulation and

data Analysis.

It consists of single and multidimensional data structure for data manipulation.

So here single dimensional means — Series object

And multidimensional means — Data-frame

Pandas Series object

For creating a pandas series object which is basically a one dimensional object.

First we have to import pandas library

For creating series we need a inbuilt datatype so we can use list and we can also use Dictionary.

pandas series object

If we want to check type of series for that we use type() .

We can also change the index in series

Here all keys has been taken as a index over here and values over here are taken as values in the series object as well.

Creating a dataframe- Multidimensional labels data structure.

Here our key becomes an attribute and value becomes records.

Importing csv using pandas

Importng CSV file

In above screen shot 1st i have imported pandas library after that use .read_csv(“file path”) and store our dataset in variable named as “ds”.

Dataframe inbuilt functions

head()-we use this for getting top 5 records from the dataset.

shape()- if we want to know the how many rows or columns are present in dataframe then

we use shape().

describe()- if we want to the basics description of a dataframe then we use describe().

tail()-we use this for getting records from bottom.

Remove column and short some data from the dataset

The drop function is to delete columns by number or position by retrieving the column name first for that use .drop. and To get the column name, provide the column index to the dataframe.column object which is a list of all column names. The name is then passed to the drop function .

To sort the DataFrame based on the values in a single column, you’ll use .sort_values(). By default, this will return dataframe in ascending order.

3) SciKit-Learn-

It provides a range of supervised and unsupervised learning algorithm via consistent interface in python.

This library is focussed on modeling data and not focussed on loading, manipulating and summarizing data. for this we use numpy and pandas.

Some ML applications built with scikit-learn

financial cybersecurity analytics

product development

neuroimaging

barcode scanner development

medical modeling and help with handling Shopify inventory issues. The wide range of decision modeling features makes scikit-learn.

Steps in Machine Learning

Define a problem

Prepare data

Evaluate algorithms

Improve results

Present results

Every thing inside a SciKit-Learn is called a classifier

An end to end model

Creating a Model

1st import datasets, metrics and SVC from sklearn

SVC — Support Vector Machine

ds = datasets.load_iris()

load iris dataset and store in ds variable.

after that fit a SVC model to the data

after print(model) we get addition information about model.

then make prediction and summarize the fit of the model.

“its great that is a few lines of code that we can build in to a model that classify data to the tune of 90% accuracy”

4) Matplotlib-

Matplotlib library is used for data visualisation.

We can import matplotlib using: from matplotlib import pyplot as plt

scatterplot

we create a diagram where each value in the dataset is represented by a dot
matplotlib module has a method for drawing scatter plots, it needs two array as the same length 1 for value x axis and another for y axis

Scatter plot

Histogram

A histogram is the most commonly used graph to show frequency distributions.

pylab is module in matplotlib that get installed alongside in matplotlib

Histogram

in above screen shot first first i have imported numpy and pylab and generate some random number and using hist() we create a histogram.

5) NLTK-

What is NLP and NLTK

The idea of NLP is to do some analysis, or processing, where the machine can understand, at least to some level, what the text means.

NLTK is library in python, stands for Natural Language Toolkit

The process of breaking down text into smaller pieces is called Tokenization.

This library contains some useful tools for text preprocessing and corpora analysis. You do not need to create your own stop words list or frequency function for every NLP project. NLTK saves your time so that you can focus on your NLP tasks instead of rewriting functions.

Word and Sentence tokenize

word and sentence tokenize

Conclusion

In this blog i have discussed about python libraries that are used in machine learning. NumPy- Stands for numerical python, it consists of multidimensional array and collection of routines for processing those arrays. Pandas consists of single and multidimensional data structure for data manipulation. We use Matplotlib for data visualization. NLTK is library in python, stands for Natural Language Toolkit, we have seen some example related to that and also done implementation part.

--

--