We can connect to relational databases for analysing data using the pandas library as well as another additional library for implementing database connectivity. This package is named as sqlalchemy which provides full SQL language functionality to be used in python.

Table of Contents

Installing SQLAlchemy

The installation is very straight forward using Anaconda which we have discussed in the chapter Data Science Environment. Assuming you have installed Anaconda as described in this chapter, run the following command in the Anaconda Prompt Window to install the SQLAlchemy package.
conda install sqlalchemy

Reading Relational Tables

We will use Sqlite3 as our relational database as it is very light weight and easy to use. Though the SQLAlchemy library can connect to a variety of relational sources including MySql, Oracle and Postgresql and Mssql. We first create a database engine and then connect to the database engine using the to_sql function of the SQLAlchemy library.
In the below example we create the relational table by using the to_sql function from a dataframe already created by reading a csv file. Then we use the read_sql_query function from pandas to execute and capture the results from various SQL queries.
from sqlalchemy import create_engine

import pandas as pd

data = pd.read_csv('/path/input.csv')

# Create the db engine
engine = create_engine('sqlite:///:memory:')

# Store the dataframe as a table
data.to_sql('data_table', engine)

# Query 1 on the relational table
res1 = pd.read_sql_query('SELECT * FROM data_table', engine)
print('Result 1')
print(res1)
print('')

# Query 2 on the relational table
res2 = pd.read_sql_query('SELECT dept,sum(salary) FROM data_table group by dept', engine)
print('Result 2')
print(res2)

When we execute the above code, it produces the following result.
Result 1
%0

Result 1
index id name salary start_date dept
0 0 1 Rick 623.30 2012-01-01 IT
1 1 2 Dan 515.20 2013-09-23 Operations
2 2 3 Tusar 611.00 2014-11-15 IT
3 3 4 Ryan 729.00 2014-05-11 HR
4 4 5 Gary 843.25 2015-03-27 Finance
5 5 6 Rasmi 578.00 2013-05-21 IT
6 6 7 Pranab 632.80 2013-07-30 Operations
7 7 8 Guru 722.50 2014-06-17 Finance

Result 2
dept sum(salary)
0 Finance 1565.75
1 HR 729.00
2 IT 1812.30
3 Operations 1148.00

Inserting Data to Relational Tables

We can also insert data into relational tables using sql.execute function available in pandas. In the below code we previous csv file as input data set, store it in a relational table and then insert another record using sql.execute.
from sqlalchemy import create_engine
from pandas.io import sql

import pandas as pd

data = pd.read_csv('C:/Users/Rasmi/Documents/pydatasci/input.csv')
engine = create_engine('sqlite:///:memory:')

# Store the Data in a relational table
data.to_sql('data_table', engine)

# Insert another row
sql.execute('INSERT INTO data_table VALUES(?,?,?,?,?,?)', engine, params=[('id',9,'Ruby',711.20,'2015-03-27','IT')])

# Read from the relational table
res = pd.read_sql_query('SELECT ID,Dept,Name,Salary,start_date FROM data_table', engine)
print(res)

When we execute the above code, it produces the following result.
id dept name salary start_date
0 1 IT Rick 623.30 2012-01-01
1 2 Operations Dan 515.20 2013-09-23
2 3 IT Tusar 611.00 2014-11-15
3 4 HR Ryan 729.00 2014-05-11
4 5 Finance Gary 843.25 2015-03-27
5 6 IT Rasmi 578.00 2013-05-21
6 7 Operations Pranab 632.80 2013-07-30
7 8 Finance Guru 722.50 2014-06-17
8 9 IT Ruby 711.20 2015-03-27

Deleting Data from Relational Tables

We can also delete data into relational tables using sql.execute function available in pandas. The below code deletes a row based on the input condition given.
from sqlalchemy import create_engine
from pandas.io import sql

import pandas as pd

data = pd.read_csv('C:/Users/Rasmi/Documents/pydatasci/input.csv')
engine = create_engine('sqlite:///:memory:')
data.to_sql('data_table', engine)

sql.execute('Delete from data_table where name = (?) ', engine, params=[('Gary')])

res = pd.read_sql_query('SELECT ID,Dept,Name,Salary,start_date FROM data_table', engine)
print(res)

When we execute the above code, it produces the following result.
id dept name salary start_date
0 1 IT Rick 623.3 2012-01-01
1 2 Operations Dan 515.2 2013-09-23
2 3 IT Tusar 611.0 2014-11-15
3 4 HR Ryan 729.0 2014-05-11
4 6 IT Rasmi 578.0 2013-05-21
5 7 Operations Pranab 632.8 2013-07-30
6 8 Finance Guru 722.5 2014-06-17

Python – NoSQL Databases

As more and more data become available as unstructured or semi-structured, the need of managing them through NoSql database increases. Python can also interact with NoSQL databases in a similar way as is interacts with Relational databases. In this chapter we will use python to interact with MongoDB as a NoSQL database.
In order to connect to MongoDB, python uses a library known as pymongo. You can add this library to your python environment, using the below command from the Anaconda environment.
conda install pymongo
This library enables python to connect to MOngoDB using a db client. Once connected we select the db name to be used for various operations.

Inserting Data

To insert data into MongoDB we use the insert() method which is available in the database environment. First we connect to the db using python code shown below and then we provide the document details in form of a series of key-value pairs.

# Import the python libraries
from pymongo import MongoClient
from pprint import pprint

# Choose the appropriate client
client = MongoClient()

# Connect to the test db 
db=client.test

# Use the employee collection
employee = db.employee
employee_details = {
'Name': 'Raj Kumar',
'Address': 'Sears Streer, NZ',
'Age': '42'
}

# Use the insert method
result = employee.insert_one(employee_details)

# Query for the inserted document.
Queryresult = employee.find_one({'Age': '42'})
pprint(Queryresult)

When we execute the above code, it produces the following result.
{u’Address’: u’Sears Streer, NZ’,
u’Age’: u’42’,
u’Name’: u’Raj Kumar’,
u’_id’: ObjectId(‘5adc5a9f84e7cd3940399f93’)}

Updating Data

Updating an existing MongoDB data is similar to inserting. We use the update() method which is native to mongoDB. In the below code we are replacing the existing record with new key-value pairs. Please note how we are using the condition criteria to decide which record to update.

# Import the python libraries
from pymongo import MongoClient
from pprint import pprint

# Choose the appropriate client
client = MongoClient()

# Connect to db
db=client.test
employee = db.employee

# Use the condition to choose the record
# and use the update method
db.employee.update_one(
{"Age":'42'},
{
"$set": {
"Name":"Srinidhi",
"Age":'35',
"Address":"New Omsk, WC"
}
}
)

Queryresult = employee.find_one({'Age':'35'})

pprint(Queryresult)

When we execute the above code, it produces the following result.

{u'Address': u'New Omsk, WC',
u'Age': u'35',
u'Name': u'Srinidhi',
u'_id': ObjectId('5adc5a9f84e7cd3940399f93')}

Deleting Data

Deleting a record is also straight forward where we use the delete method. Here also we mention the condition which is used to choose the record to be deleted.

# Import the python libraries
from pymongo import MongoClient
from pprint import pprint

# Choose the appropriate client
client = MongoClient()

# Connect to db
db=client.test
employee = db.employee

# Use the condition to choose the record
# and use the delete method
db.employee.delete_one({"Age":'35'})

Queryresult = employee.find_one({'Age':'35'})

pprint(Queryresult)

When we execute the above code, it produces the following result.
None
So we see the particular record does not exist in the db any more.

So, this brings us to the end of blog. This Tecklearn ‘Handling Relational and NoSQL Databases in Python’ blog helps you with commonly asked questions if you are looking out for a job in Python Programming. If you wish to learn Python and build a career in Python Programming domain, then check out our interactive, Python with Data Science Training, that comes with 24*7 support to guide you throughout your learning period.

Python with Data Science Training

About the Course

Python with Data Science training lets you master the concepts of the widely used and powerful programming language, Python. This Python Course will also help you master important Python programming concepts such as data operations, file operations, object-oriented programming and various Python libraries such as Pandas, NumPy, Matplotlib which are essential for Data Science. You will work on real-world projects in the domain of Python and apply it for various domains of Big Data, Data Science and Machine Learning.

Why Should you take Python with Data Science Training?

• Python is the preferred language for new technologies such as Data Science and Machine Learning.
• Average salary of Python Certified Developer is $123,656 per annum – Indeed.com
• Python is by far the most popular language for data science. Python held 65.6% of the data science market.

What you will Learn in this Course?

Introduction to Python

• Define Python
• Understand the need for Programming
• Know why to choose Python over other languages
• Setup Python environment
• Understand Various Python concepts – Variables, Data Types Operators, Conditional Statements and Loops
• Illustrate String formatting
• Understand Command Line Parameters and Flow control

Python Environment Setup and Essentials

• Python installation
• Windows, Mac & Linux distribution for Anaconda Python
• Deploying Python IDE
• Basic Python commands, data types, variables, keywords and more

Python language Basic Constructs

• Looping in Python
• Data Structures: List, Tuple, Dictionary, Set
• First Python program
• Write a Python Function (with and without parameters)
• Create a member function and a variable
• Tuple
• Dictionary
• Set and Frozen Set
• Lambda function

OOP (Object Oriented Programming) in Python

• Object-Oriented Concepts

Working with Modules, Handling Exceptions and File Handling

• Standard Libraries
• Modules Used in Python (OS, Sys, Date and Time etc.)
• The Import statements
• Module search path
• Package installation ways
• Errors and Exception Handling
• Handling multiple exceptions

Introduction to NumPy

• Introduction to arrays and matrices
• Indexing of array, datatypes, broadcasting of array math
• Standard deviation, Conditional probability
• Correlation and covariance
• NumPy Exercise Solution

Introduction to Pandas

• Pandas for data analysis and machine learning
• Pandas for data analysis and machine learning Continued
• Time series analysis
• Linear regression
• Logistic Regression
• ROC Curve
• Neural Network Implementation
• K Means Clustering Method

Data Visualisation

• Matplotlib library
• Grids, axes, plots
• Markers, colours, fonts and styling
• Types of plots – bar graphs, pie charts, histograms
• Contour plots

Data Manipulation

• Perform function manipulations on Data objects
• Perform Concatenation, Merging and Joining on DataFrames
• Iterate through DataFrames
• Explore Datasets and extract insights from it

Scikit-Learn for Natural Language Processing

• What is natural language processing, working with NLP on text data
• Scikit-Learn for Natural Language Processing
• The Scikit-Learn machine learning algorithms
• Sentimental Analysis – Twitter

Introduction to Python for Hadoop

• Deploying Python coding for MapReduce jobs on Hadoop framework.
• Python for Apache Spark coding
• Deploying Spark code with Python
• Machine learning library of Spark MLlib
• Deploying Spark MLlib for Classification, Clustering and Regression

Got a question for us? Please mention it in the comments section and we will get back to you.

277