Python: Difference between revisions

From David's Wiki
Line 153: Line 153:
# Delete an empty directory
# Delete an empty directory
os.rmdir(dir_path)
os.rmdir(dir_path)
# Delete a empty or non-empty directory
# Delete an empty or non-empty directory
shutil.rmtree(dir_path)
shutil.rmtree(dir_path)
# Wait until it is deleted
# Wait until it is deleted

Revision as of 15:33, 13 January 2020


Python is the primary language used for data scientists.
It features some of the most useful scientific computing and machine learning libraries such as Numpy, Tensorflow, and PyTorch.

Installation

Use Anaconda.

  • Add C:\Users\[username]\Anaconda3\Scripts to your path
  • Run conda init in your bash
  • Run conda config --set auto_activate_base false

Usage

How to use Python 3.

pip

Pip is the package manager for python.
Your package requirements should be written to requirements.txt
Install all requirements using pip install -r requirements.txt

Syntax

Ternary Operator

Reference

is_nice = True
state = "nice" if is_nice else "not nice"

Lambda Function

lambda x: x * 2

Spread

Reference

myfun(*tuple)

Strings

String Interpolation

Reference
Python has 3 syntax variations for string interpolation.

name = 'World'
program = 'Python'
print(f'Hello {name}! This is {program}')

print("%s %s" %('Hello','World',))

name = 'world'
program ='python'
print('Hello {name}!This is{program}.'.format(name=name,program=program))

Arrays

Use Numpy to provide array functionality

Array Indexing

Scipy Reference

Lists

The default data structure in Python is lists.
A lot of functional programming can be done with lists

groceries = ["apple", "orange"]

groceries.reverse()
# ["orange", "apple"]

groceries_str = ",".join(groceries)
# "apple,orange"

groceries_str.split(",")
# ["apple", "orange"]

# Note that functions such as map, enumerate, range return enumerable items
# which you can iterate over in a for loop
# You can also convert these to lists by calling list() if necessary

enumerate(groceries)
# [(0, "apple"), (1, "orange")]

Filesystem

Paths

Use os.path

import os.path as path

my_file = path.join("folder_1", "my_great_dataset.tar.gz")
#  "folder_1\\my_great_dataset.tar.gz"

# Get the filename with extension
filename = path.basename(my_file)
# "my_great_dataset.tar.gz"

# Get the filename without extension
filename_no_ext = path.splitext(filename)[0]
# Note that splitext returns ("my_great_dataset.tar", ".gz")

List all files in a folder

Reference

gazeDir = "Gaze_txt_files"
# List of folders in root folder
gazeFolders = [path.join(gazeDir, x) for x in os.listdir(gazeDir)]
# List of files 2 folders down
gazeFiles = [path.join(x, y) for x in gazeFolders for y in os.listdir(x)]

Read/Write entire text file into a list

Reading
[1]

with open('C:/path/numbers.txt') as f:
    lines = f.read().splitlines()

Writing
[2]

with open('your_file.txt', 'w') as f:
    f.write("\n".join(my_list))


Create or Delete Directory/Folders

Reference

import os, shutil, time
import os.path as path

# Or os.makedirs(path, exist_ok=True)
def ensure_dir_exists(dir_path):
    if not os.path.exists(dir_path):
        try:
            os.makedirs(dir_path)
        except OSError as exc: # Guard against race condition
            if exc.errno != errno.EEXIST:
                raise


# Example usage to create new_dir folder
ensure_dir_exists("new_dir")
# or
ensure_dir_exists(os.path.dirname("new_dir/my_file.txt"))

# Delete an empty directory
os.rmdir(dir_path)
# Delete an empty or non-empty directory
shutil.rmtree(dir_path)
# Wait until it is deleted
while os.path.isdir(dir_path):
  time.sleep(0.01)

Copying or moving a file or folder

Copying
Shutil docs

import shutil

# Copy a file
shutil.copy2('original.txt', 'duplicate.txt')

# Move a file
shutil.move('original.txt', 'my_folder/original.txt')

Regular Expressions (Regex)

Reference

import re
myReg = re.compile(r'height:(\d+)cm')
myMatch = re.match(myReg, "height:33cm");
print(myMatch[1])
# 33
Notes
  • re.match will return None if there is no match
  • re.match matches from the beginning of the string
  • Use re.findall to match from anywhere in the string

Spawning Processes

Use subprocess to spawn other programs.

import subprocess
subprocess.run(["ls", "-l"], cwd="/")


Timing Code

Reference

import time

t0 = time.time()
code_block
t1 = time.time()

total = t1-t0

requests

Use the requests library to download files and scrape webpages
See Get and post requests in Python

import requests
url = R"https://www.google.com"
req = requests.get(url)
req.text

# To save to disk
with open("google.html", "wb") as f:
  f.write(req.content)

if main

If you are writing a script with functions you want to be included in other scripts, use __name__ to detect if your script is being run or being imported. What does if __name__ == "__main__" do?

if __name__ == "__main__":

Anaconda

How to use Anaconda:

# Create an environment
conda create -n tf2 python

# Activate an environment
conda activate tf2

# Change version of Python
conda install python=3.6

Libraries

Numpy

Matplotlib

Matplotlib is the main library used for making graphs.
Alternatively, there are also Python bindings for ggplot2
Examples
Gallery