In [2]:
import os
cwd = os.getcwd()
In [3]:
cwd
Out[3]:
'D:\\Users\\ceviherdian'

DATA SCIENTIST

In this tutorial, I only explain you what you need to be a data scientist neither more nor less.

Data scientist need to have these skills:

1.Basic Tools: Like python, R or SQL. You do not need to know everything. What you only need is to learn how to use python

2.Basic Statistics: Like mean, median or standart deviation. If you know basic statistics, you can use python easily.

3.Data Munging: Working with messy and difficult data. Like a inconsistent date and string formatting. As you guess, python helps us.

4.Data Visualization: Title is actually explanatory. We will visualize the data with python like matplot and seaborn libraries.

5.Machine Learning: You do not need to understand math behind the machine learning technique. You only need is understanding basics of machine learning and learning how to implement it while using python.

In this part, you learn:

•User defined function

•Scope

•Nested function

•Default and flexible arguments

•Lambda function

•Anonymous function

•Iterators

•List comprehension

2. Python Data Science Toolbox:

A.User defined function

B.Scope

C. Nested function

D. Default and flexible arguments

E. Lambda function

F. Anonymous function

G. Iterators

H. List Comprehension

A. User defined function

What we need to know about functions:

•docstrings: documentation for functions.

Example: for f(): """This is docstring for documentation of function f"""

•tuble: sequence of immutable python objects. cant modify values tuble uses paranthesis like tuble = (1,2,3) unpack tuble into several variables like a,b,c = tuble

In [9]:
# example of what we learn above
def tuble_ex():
    """ return defined t tuble"""
    t = (1,2,3)
    return t
a,b,c = tuble_ex()
print(a,b,c)
1 2 3

B. Scope

What we need to know about scope:

•global: defined main body in script

•local: defined in a function

•built in scope: names in predefined built in scope module such as print, len

Lets make some basic examples

In [12]:
# guess print what
x = 2
def f():
    x = 3
    return x
print(x)      # x = 2 global scope
print(f())    # x = 3 local scope
2
3

C. Nested function

•Function inside function.

•There is a LEGB rule that is search local scope, enclosing function, global and built in scopes, respectively.

In [15]:
#nested function
def square():
    """ return square of value """
    def add():
        """ add two local variable """
        x = 2
        y = 3
        z = x + y
        return z
    return add()**2
print(square())  
25

D. Default and flexible arguments

•Default argument example:

def f(a, b=1):

""" b = 1 is default argument"""

•Flexible argument example:

def f(*args):

""" *args can be one or more"""

def f(** kwargs)

""" **kwargs is a dictionary"""

lets write some code to practice

In [24]:
# default arguments
def f(a, b = 1, c = 2):
    y = a + b + c
    return y
print(f(5))
# what if we want to change default arguments
print(f(5,4,3))
8
12
In [25]:
# flexible arguments *args
def f(*args):
    for i in args:
        print(i)
f(1)
print("")
f(1,2,3,4)
# flexible arguments **kwargs that is dictionary
def f(**kwargs):
    """ print key and value of dictionary"""
    for key, value in kwargs.items():               # If you do not understand this part turn for loop part and look at dictionary in for loop
        print(key, " ", value)
f(country = 'spain', capital = 'madrid', population = 123456)
1

1
2
3
4
country   spain
capital   madrid
population   123456

E. Lambda function

Faster way of writing function

In [28]:
# lambda function
square = lambda x: x**2     # where x is name of argument
print(square(4))
tot = lambda x,y,z: x+y+z   # where x,y,z are names of arguments
print(tot(1,2,3))
16
6

F. Anonymous function

Like lambda function but it can take more than one arguments.

•map(func,seq) : applies a function to all the items in a list

In [32]:
number_list = [1,2,3]
y = map(lambda x:x**2,number_list)
print(list(y))
[1, 4, 9]

G. Iterators

•iterable is an object that can return an iterator

•iterable: an object with an associated iter() method

example: list, strings and dictionaries

•iterator: produces next value with next() method

In [35]:
# iteration example
name = "ronaldo"
it = iter(name)
print(next(it))    # print next iteration
print(*it)         # print remaining iteration
r
o n a l d o

zip(): zip lists

In [37]:
# zip example
list1 = [1,2,3,4]
list2 = [5,6,7,8]
z = zip(list1,list2)
print(z)
z_list = list(z)
print(z_list)
<zip object at 0x0000000004EB1348>
[(1, 5), (2, 6), (3, 7), (4, 8)]
In [38]:
un_zip = zip(*z_list)
un_list1,un_list2 = list(un_zip) # unzip returns tuble
print(un_list1)
print(un_list2)
print(type(un_list2))
(1, 2, 3, 4)
(5, 6, 7, 8)
<class 'tuple'>

H. List Comprehension

One of the most important topic of this kernel

We use list comprehension for data analysis often. list comprehension: collapse for loops for building lists into a single line Ex: num1 = [1,2,3] and we want to make it num2 = [2,3,4]. This can be done with for loop. However it is unnecessarily long. We can make it one line code that is list comprehension.

In [43]:
# Example of list comprehension
num1 = [1,2,3]
num2 = [i + 1 for i in num1 ]
print(num2)
[2, 3, 4]

[i + 1 for i in num1 ]: list of comprehension

i +1: list comprehension syntax

for i in num1: for loop syntax

i: iterator

num1: iterable object

In [45]:
# Conditionals on iterable
num1 = [5,10,15]
num2 = [i**2 if i == 10 else i-5 if i < 7 else i+5 for i in num1]
print(num2)
[0, 100, 20]

Data and Package:

In [49]:
#Package: matplotlib, seaborn,numpy, and pandas (for dataframe data structure manipulation)
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import seaborn as sns  # visualization tool
In [50]:
data = pd.read_csv('pokemon.csv')
In [53]:
data.head(5)
Out[53]:
# Name Type 1 Type 2 HP Attack Defense Sp. Atk Sp. Def Speed Generation Legendary speed_level
0 1 Bulbasaur Grass Poison 45 49 49 65 65 45 1 False low
1 2 Ivysaur Grass Poison 60 62 63 80 80 60 1 False low
2 3 Venusaur Grass Poison 80 82 83 100 100 80 1 False high
3 4 Mega Venusaur Grass Poison 80 100 123 122 120 80 1 False high
4 5 Charmander Fire NaN 39 52 43 60 50 65 1 False low
In [52]:
# lets return pokemon csv and make one more list comprehension example
# lets classify pokemons whether they have high or low speed. Our threshold is average speed.
threshold = sum(data.Speed)/len(data.Speed)
data["speed_level"] = ["high" if i > threshold else "low" for i in data.Speed]
data.loc[:10,["speed_level","Speed"]] # we will learn loc more detailed later
Out[52]:
speed_level Speed
0 low 45
1 low 60
2 high 80
3 high 80
4 low 65
5 high 80
6 high 100
7 high 100
8 high 100
9 low 43
10 low 58

Cheers!

/itsmecevi