Data Science Introduction and How to set up python

Last updated on Jan 20 2023
Prabhas Ramanathan

Data science is the process of deriving knowledge and insights from a huge and diverse set of data through organizing, processing and analysing the data. It involves many different disciplines like mathematical and statistical modelling, extracting data from it source and applying data visualization techniques. Often it also involves handling big data technologies to gather both structured and unstructured data. Below we will see some example scenarios where Data science is used.

Table of Contents

Recommendation systems

As online shopping becomes more prevalent, the e-commerce platforms are able to capture users shopping preferences as well as the performance of various products in the market. This leads to creation of recommendation systems which create models predicting the shoppers needs and show the products the shopper is most likely to buy.

Financial Risk management

The financial risk involving loans and credits are better analysed by using the customers past spend habits, past defaults, other financial commitments and many socio-economic indicators. These data is gathered from various sources in different formats. Organising them together and getting insight into customers profile needs the help of Data science. The outcome is minimizing loss for the financial organization by avoiding bad debt.

Improvement in Health Care services

The health care industry deals with a variety of data which can be classified into technical data, financial data, patient information, drug information and legal rules. All this data need to be analysed in a coordinated manner to produce insights that will save cost both for the health care provider and care receiver while remaining legally compliant.

Computer Vision

The advancement in recognizing an image by a computer involves processing large sets of image data from multiple objects of same category. For example, Face recognition. These data sets are modelled, and algorithms are created to apply the model to newer images to get a satisfactory result. Processing of these huge data sets and creation of models need various tools used in Data science.

Efficient Management of Energy

As the demand for energy consumption soars, the energy producing companies need to manage the various phases of the energy production and distribution more efficiently. This involves optimizing the production methods, the storage and distribution mechanisms as well as studying the customers consumption patterns. Linking the data from all these sources and deriving insight seems a daunting task. This is made easier by using the tools of data science.

Python in Data Science

The programming requirements of data science demands a very versatile yet flexible language which is simple to write the code but can handle highly complex mathematical processing. Python is most suited for such requirements as it has already established itself both as a language for general computing as well as scientific computing. More over it is being continuously upgraded in form of new addition to its plethora of libraries aimed at different programming requirements. Below we will discuss such features of python which makes it the preferred language for data science.
• A simple and easy to learn language which achieves result in fewer lines of code than other similar languages like R. Its simplicity also makes it robust to handle complex scenarios with minimal code and much less confusion on the general flow of the program.
• It is cross platform, so the same code works in multiple environments without needing any change. That makes it perfect to be used in a multi-environment setup easily.
• It executes faster than other similar languages used for data analysis like R and MATLAB.
• Its excellent memory management capability, especially garbage collection makes it versatile in gracefully managing very large volume of data transformation, slicing, dicing and visualization.
• Most importantly Python has got a very large collection of libraries which serve as special purpose analysis tools. For example – the NumPy package deals with scientific computing and its array needs much less memory than the conventional python list for managing numeric data. And the number of such packages is continuously growing.
• Python has packages which can directly use the code from other languages like Java or C. This helps in optimizing the code performance by using existing code of other languages, whenever it gives a better result.

Python – Data Science Environment Setup

To successfully create and run the example code in this tutorial we will need an environment set up which will have both general-purpose python as well as the special packages required for Data science. We will first look as installing the general-purpose python which can be python 2 or python 3. But we will prefer python 2 for this tutorial mainly because of its maturity and wider support of external packages.

Getting Python

The most up-to-date and current source code, binaries, documentation, news, etc., is available on the official website of Python https://www.python.org/
You can download Python documentation from https://www.python.org/doc/. The documentation is available in HTML, PDF, and PostScript formats.

Installing Python

Python distribution is available for a wide variety of platforms. You need to download only the binary code applicable for your platform and install Python.
If the binary code for your platform is not available, you need a C compiler to compile the source code manually. Compiling the source code offers more flexibility in terms of choice of features that you require in your installation.
Here is a quick overview of installing Python on various platforms −

Unix and Linux Installation

Here are the simple steps to install Python on Unix/Linux machine.
• Open a Web browser and go to https://www.python.org/downloads
• Follow the link to download zipped source code available for Unix/Linux.
• Download and extract files.
• Editing the Modules/Setup file if you want to customize some options.
• run ./configure script
• make
• make install
This installs Python at standard location /usr/local/bin and its libraries at /usr/local/lib/pythonXX where XX is the version of Python.

Windows Installation

Here are the steps to install Python on Windows machine.
• Open a Web browser and go to https://www.python.org/downloads
• Follow the link for the Windows installer python-XYZ.msi file where XYZ is the version you need to install.
• To use this installer python-XYZ.msi, the Windows system must support Microsoft Installer 2.0. Save the installer file to your local machine and then run it to find out if your machine supports MSI.
• Run the downloaded file. This brings up the Python install wizard, which is really easy to use. Just accept the default settings, wait until the install is finished, and you are done.

Macintosh Installation

Recent Macs come with Python installed, but it may be several years out of date. See http://www.python.org/download/mac/ for instructions on getting the current version along with extra tools to support development on the Mac. For older Mac OS’s before Mac OS X 10.3 (released in 2003), MacPython is available.
Jack Jansen maintains it and you can have full access to the entire documentation at his website − http://www.cwi.nl/~jack/macpython.html You can find complete installation details for Mac OS installation.

Setting up PATH

Programs and other executable files can be in many directories, so operating systems provide a search path that lists the directories that the OS searches for executables.
The path is stored in an environment variable, which is a named string maintained by the operating system. This variable contains information available to the command shell and other programs.
The path variable is named as PATH in Unix or Path in Windows (Unix is case sensitive; Windows is not).
In Mac OS, the installer handles the path details. To invoke the Python interpreter from any particular directory, you must add the Python directory to your path.

Setting path at Unix/Linux

To add the Python directory to the path for a particular session in Unix −
• In the csh shell − type setenv PATH “$PATH:/usr/local/bin/python” and press Enter.
• In the bash shell (Linux) − type export ATH=”$PATH:/usr/local/bin/python” and press Enter.
• In the sh or ksh shell − type PATH=”$PATH:/usr/local/bin/python” and press Enter.
• Note − /usr/local/bin/python is the path of the Python directory

Setting path at Windows

To add the Python directory to the path for a particular session in Windows −
At the command prompt − type path %path%;C:\Python and press Enter.
Note − C:\Python is the path of the Python directory

Python Environment Variables

Here are important environment variables, which can be recognized by Python

Sr.No. Variable & Description
1 PYTHONPATH

It has a role similar to PATH. This variable tells the Python interpreter where to locate the module files imported into a program. It should include the Python source library directory and the directories containing Python source code. PYTHONPATH is sometimes preset by the Python installer.

2 PYTHONSTARTUP

It contains the path of an initialization file containing Python source code. It is executed every time you start the interpreter. It is named as .pythonrc.py in Unix and it contains commands that load utilities or modify PYTHONPATH.

3 PYTHONCASEOK

It is used in Windows to instruct Python to find the first case-insensitive match in an import statement. Set this variable to any value to activate it.

4 PYTHONHOME

It is an alternative module search path. It is usually embedded in the PYTHONSTARTUP or PYTHONPATH directories to make switching module libraries easy.

Running Python

There are three different ways to start Python −
Interactive Interpreter
You can start Python from Unix, DOS, or any other system that provides you a command-line interpreter or shell window.
Enter python the command line.
Start coding right away in the interactive interpreter.
$python # Unix/Linux
or
python% # Unix/Linux
or
C:> python # Windows/DOS
Here is the list of all the available command line options −

Sr.No. Option & Description
1 -d

It provides debug output.

2 -O

It generates optimized bytecode (resulting in .pyo files).

3 -S

Do not run import site to look for Python paths on startup.

4 -v

verbose output (detailed trace on import statements).

5 -X

disable class-based built-in exceptions (just use strings); obsolete starting with version 1.6.

6 -c cmd

run Python script sent in as cmd string

7 file

run Python script from given file

Script from the Command-line

A Python script can be executed at command line by invoking the interpreter on your application, as in the following −
$python script.py # Unix/Linux

or

python% script.py # Unix/Linux

or

C: >python script.py # Windows/DOS
Note − Be sure the file permission mode allows execution.

Integrated Development Environment

You can run Python from a Graphical User Interface (GUI) environment as well, if you have a GUI application on your system that supports Python.
• Unix − IDLE is the very first Unix IDE for Python.
• Windows − PythonWin is the first Windows interface for Python and is an IDE with a GUI.
• Macintosh − The Macintosh version of Python along with the IDLE IDE is available from the main website, downloadable as either MacBinary or BinHex’d files.

Installing SciPy Pack

The best way to enable the required packs is to use an installable binary package specific to your operating system. These binaries contain full SciPy stack (inclusive of NumPy, SciPy, matplotlib, IPython, SymPy and nose packages along with core Python).

Windows

Anaconda (from www.continuum.io) is a free Python distribution for SciPy stack. It is also available for Linux and Mac.
Canopy (www.enthought.com/products/canopy/) is available as free as well as commercial distribution with full SciPy stack for Windows, Linux and Mac.
Python (x,y): It is a free Python distribution with SciPy stack and Spyder IDE for Windows OS. (Downloadable from www.python-xy.github.io/)

Linux

Package managers of respective Linux distributions are used to install one or more packages in SciPy stack.

For Ubuntu

sudo apt-get install python-numpy
python-scipy python-matplotlibipythonipythonnotebook python-pandas
python-sympy python-nose

For Fedora

sudo yum install numpyscipy python-matplotlibipython
python-pandas sympy python-nose atlas-devel

Building from Source

Core Python (2.6.x, 2.7.x and 3.2.x onwards) must be installed with distutils and zlib module should be enabled.
GNU gcc (4.2 and above) C compiler must be available.
To install NumPy, run the following command.
Python setup.py install
Let us test whether NumPy module is properly installed, try to import it from Python prompt.
If it is not installed, the following error message will be displayed.
Traceback (most recent call last):
File “<pyshell#0>”, line 1, in <module>
import numpy
ImportError: No module named ‘numpy’

So, this brings us to the end of blog. This Tecklearn ‘Data Science Introductions and How to set up python’ blog helps you with commonly asked questions if you are looking out for a job in Python Programming. If you wish to learn Python and build a career in Python Programming domain, then check out our interactive, Python with Data Science Training, that comes with 24*7 support to guide you throughout your learning period.

Python with Data Science Training

About the Course

Python with Data Science training lets you master the concepts of the widely used and powerful programming language, Python. This Python Course will also help you master important Python programming concepts such as data operations, file operations, object-oriented programming and various Python libraries such as Pandas, NumPy, Matplotlib which are essential for Data Science. You will work on real-world projects in the domain of Python and apply it for various domains of Big Data, Data Science and Machine Learning.

Why Should you take Python with Data Science Training?

• Python is the preferred language for new technologies such as Data Science and Machine Learning.
• Average salary of Python Certified Developer is $123,656 per annum – Indeed.com
• Python is by far the most popular language for data science. Python held 65.6% of the data science market.

What you will Learn in this Course?

Introduction to Python

• Define Python
• Understand the need for Programming
• Know why to choose Python over other languages
• Setup Python environment
• Understand Various Python concepts – Variables, Data Types Operators, Conditional Statements and Loops
• Illustrate String formatting
• Understand Command Line Parameters and Flow control

Python Environment Setup and Essentials

• Python installation
• Windows, Mac & Linux distribution for Anaconda Python
• Deploying Python IDE
• Basic Python commands, data types, variables, keywords and more

Python language Basic Constructs

• Looping in Python
• Data Structures: List, Tuple, Dictionary, Set
• First Python program
• Write a Python Function (with and without parameters)
• Create a member function and a variable
• Tuple
• Dictionary
• Set and Frozen Set
• Lambda function

OOP (Object Oriented Programming) in Python

• Object-Oriented Concepts

Working with Modules, Handling Exceptions and File Handling

• Standard Libraries
• Modules Used in Python (OS, Sys, Date and Time etc.)
• The Import statements
• Module search path
• Package installation ways
• Errors and Exception Handling
• Handling multiple exceptions

Introduction to NumPy

• Introduction to arrays and matrices
• Indexing of array, datatypes, broadcasting of array math
• Standard deviation, Conditional probability
• Correlation and covariance
• NumPy Exercise Solution

Introduction to Pandas

• Pandas for data analysis and machine learning
• Pandas for data analysis and machine learning Continued
• Time series analysis
• Linear regression
• Logistic Regression
• ROC Curve
• Neural Network Implementation
• K Means Clustering Method

Data Visualisation

• Matplotlib library
• Grids, axes, plots
• Markers, colours, fonts and styling
• Types of plots – bar graphs, pie charts, histograms
• Contour plots

Data Manipulation

• Perform function manipulations on Data objects
• Perform Concatenation, Merging and Joining on DataFrames
• Iterate through DataFrames
• Explore Datasets and extract insights from it

Scikit-Learn for Natural Language Processing

• What is natural language processing, working with NLP on text data
• Scikit-Learn for Natural Language Processing
• The Scikit-Learn machine learning algorithms
• Sentimental Analysis – Twitter

Introduction to Python for Hadoop

• Deploying Python coding for MapReduce jobs on Hadoop framework.
• Python for Apache Spark coding
• Deploying Spark code with Python
• Machine learning library of Spark MLlib
• Deploying Spark MLlib for Classification, Clustering and Regression

Got a question for us? Please mention it in the comments section and we will get back to you.

0 responses on "Data Science Introduction and How to set up python"

Leave a Message

Your email address will not be published. Required fields are marked *