Python
Python is the primary language used for data scientists.
It features some of the most useful scientific computing and machine learning libraries such as Numpy, Tensorflow, and PyTorch.
Installation
Use Anaconda.
- Add
C:\Users\[username]\Anaconda3\Scripts
to your path - Run
conda init
in your bash - Run
conda config --set auto_activate_base false
Usage
How to use Python 3.
pip
Pip is the package manager for python.
Your package requirements should be written to requirements.txt
Install all requirements using pip install -r requirements.txt
Syntax
Ternary Operator
is_nice = True
state = "nice" if is_nice else "not nice"
Lambda Function
lambda x: x * 2
Spread
myfun(*tuple)
Strings
String Interpolation
Reference
Python has 3 syntax variations for string interpolation.
name = 'World'
program = 'Python'
print(f'Hello {name}! This is {program}')
print("%s %s" %('Hello','World',))
name = 'world'
program ='python'
print('Hello {name}!This is{program}.'.format(name=name,program=program))
Arrays
Use Numpy to provide array functionality
Array Indexing
Filesystem
Paths
Use os.path
import os.path as path
my_file = path.join("folder_1", "my_great_dataset.tar.gz")
# "folder_1\\my_great_dataset.tar.gz"
# Get the filename with extension
filename = path.basename(my_file)
# "my_great_dataset.tar.gz"
# Get the filename without extension
filename_no_ext = path.splitext(filename)[0]
# Note that splitext returns ("my_great_dataset.tar", ".gz")
List all files in a folder
gazeDir = "Gaze_txt_files"
# List of folders in root folder
gazeFolders = [path.join(gazeDir, x) for x in os.listdir(gazeDir)]
# List of files 2 folders down
gazeFiles = [path.join(x, y) for x in gazeFolders for y in os.listdir(x)]
Read/Write entire text file into a list
Reading
[1]
with open('C:/path/numbers.txt') as f:
lines = f.read().splitlines()
Writing
[2]
with open('your_file.txt', 'w') as f:
f.write("\n".join(my_list))
Directories
Create, Move, and Delete directories or folders
import os, shutil, time
import os.path as path
# Create a directory
os.makedirs("new_dir", exist_ok=True)
# or os.makedirs(os.path.dirname("new_dir/my_file.txt"), exist_ok=True)
# Delete an empty directory
os.rmdir(dir_path)
# Delete an empty or non-empty directory
shutil.rmtree(dir_path)
# Wait until it is deleted
while os.path.isdir(dir_path):
time.sleep(0.01)
Copying or moving a file or folder
import shutil
# Copy a file
shutil.copy2('original.txt', 'duplicate.txt')
# Move a file
shutil.move('original.txt', 'my_folder/original.txt')
Regular Expressions (Regex)
import re
myReg = re.compile(r'height:(\d+)cm')
myMatch = re.match(myReg, "height:33cm");
print(myMatch[1])
# 33
- Notes
re.match
will return None if there is no matchre.match
matches from the beginning of the string- Use
re.findall
to match from anywhere in the string
Spawning Processes
Use subprocess to spawn other programs.
import subprocess
subprocess.run(["ls", "-l"], cwd="/")
Timing Code
import time
t0 = time.time()
code_block
t1 = time.time()
total = t1-t0
requests
Use the requests library to download files and scrape webpages
See Get and post requests in Python
import requests
url = R"https://www.google.com"
req = requests.get(url)
req.text
# To save to disk
with open("google.html", "wb") as f:
f.write(req.content)
if main
If you are writing a script with functions you want to be included in other scripts, use __name__
to detect if your script is being run or being imported.
What does if __name__ == "__main__" do?
if __name__ == "__main__":
Data Structures
Lists
The default data structure in Python is lists.
A lot of functional programming can be done with lists
groceries = ["apple", "orange"]
groceries.reverse()
# ["orange", "apple"]
groceries_str = ",".join(groceries)
# "apple,orange"
groceries_str.split(",")
# ["apple", "orange"]
# Note that functions such as map, enumerate, range return enumerable items
# which you can iterate over in a for loop
# You can also convert these to lists by calling list() if necessary
enumerate(groceries)
# [(0, "apple"), (1, "orange")]
Dictionaries
Dictionaries are hashmaps in Python
# Create a dictionary
my_map = {}
# Or
my_map = {1: "a", 2: "b"}
Anaconda
How to use Anaconda:
# Create an environment
conda create -n tf2 python
# Activate an environment
conda activate tf2
# Change version of Python
conda install python=3.6
Libraries
Numpy
Matplotlib
Matplotlib is the main library used for making graphs.
Alternatively, there are also Python bindings for ggplot2
Examples
Gallery