Python: Difference between revisions
(103 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
__FORCETOC__ | __FORCETOC__ | ||
Python is the primary language used for data scientists.<br> | |||
It features some of the most useful scientific computing and machine learning libraries such as Numpy, Tensorflow, and PyTorch. | |||
==Installation== | ==Installation== | ||
Line 15: | Line 18: | ||
Install all requirements using <code>pip install -r requirements.txt</code> | Install all requirements using <code>pip install -r requirements.txt</code> | ||
===Ternary Operator=== | ===Syntax=== | ||
====Ternary Operator==== | |||
[http://book.pythontips.com/en/latest/ternary_operators.html Reference] | [http://book.pythontips.com/en/latest/ternary_operators.html Reference] | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
is_nice = True | is_nice = True | ||
state = "nice" if is_nice else "not nice" | state = "nice" if is_nice else "not nice" | ||
</syntaxhighlight> | |||
====Lambda Function==== | |||
<syntaxhighlight lang="python"> | |||
lambda x: x * 2 | |||
</syntaxhighlight> | |||
====Spread==== | |||
[https://stackoverflow.com/questions/1993727/expanding-tuples-into-arguments Reference] | |||
<syntaxhighlight lang="python"> | |||
myfun(*tuple) | |||
</syntaxhighlight> | |||
====For loops==== | |||
<syntaxhighlight lang="python"> | |||
# Normal for loop | |||
for i in range(5): | |||
pass | |||
# 2D for loop | |||
for i, j in np.ndindex((5, 5)): | |||
pass | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 31: | Line 58: | ||
print(f'Hello {name}! This is {program}') | print(f'Hello {name}! This is {program}') | ||
print("%s %s" %('Hello','World',)) | print("%s %s" % ('Hello','World',)) | ||
name = 'world' | name = 'world' | ||
program ='python' | program ='python' | ||
print('Hello {name}!This is{program}.'.format(name=name,program=program)) | print('Hello {}! This is {}.'.format(name, program)) | ||
print('Hello {name}! This is {program}.'.format(name=name, program=program)) | |||
# Format to two decimal places | |||
print(f"Accuracy: {accuracy:.02f}%") | |||
# Format an int to 2 digits | |||
print(f"Size: {size:02}%") | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 41: | Line 75: | ||
Use Numpy to provide array functionality | Use Numpy to provide array functionality | ||
====Array Indexing==== | ====Array Indexing==== | ||
[https:// | [https://numpy.org/doc/stable/user/basics.indexing.html Numpy Indexing] | ||
Numpy has very powerful indexing. See the above reference. | |||
===Filesystem=== | |||
====Paths==== | |||
Use [https://docs.python.org/3/library/os.path.html <code>os.path</code>] | |||
<syntaxhighlight lang="python"> | |||
import os.path as path | |||
my_file = path.join("folder_1", "my_great_dataset.tar.gz") | |||
# "folder_1\\my_great_dataset.tar.gz" | |||
# Get the filename with extension | |||
filename = path.basename(my_file) | |||
# "my_great_dataset.tar.gz" | |||
# Get the filename without extension | |||
filename_no_ext = path.splitext(filename)[0] | |||
# Note that splitext returns ("my_great_dataset.tar", ".gz") | |||
</syntaxhighlight> | |||
= | If using Python >=3.4, you also have [https://docs.python.org/3/library/pathlib.html <code>pathlib</code>] | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
from pathlib import Path | |||
p = Path("my_folder") | |||
# Join paths | |||
pp = Path(p, "files.tar.gz") | |||
pp.suffix # returns ".gz" | |||
pp.suffixes # returns [".tar", ".gz"] | |||
pp.name # returns "files.tar.gz" | |||
pp.parent # returns "my_folder" | |||
</syntaxhighlight> | </syntaxhighlight> | ||
;Notes | |||
* One annoyance with <code>pathlib.Path</code> is that you need to convert things to strings manually | |||
** This can be done with <code>str</code>, <code>.resolve()</code>, or <code>os.fspath()</code> | |||
* [https://treyhunner.com/2019/01/no-really-pathlib-is-great/ "No really, pathlib is great" by Trey Hunger] | |||
====List all files in a folder==== | ====List all files in a folder==== | ||
[https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory Reference] | [https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory Reference] | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
gaze_directory = "Gaze_txt_files" | |||
# List of folders in root folder | # List of folders in root folder | ||
gaze_folders = [path.join(gaze_directory, x) for x in os.listdir(gaze_directory)] | |||
# List of files 2 folders down | # List of files 2 folders down | ||
gaze_files = [path.join(x, y) for x in gaze_folders for y in os.listdir(x)] | |||
</syntaxhighlight> | </syntaxhighlight> | ||
See also glob. | |||
====Read/Write entire text file into a list==== | ====Read/Write entire text file into a list==== | ||
Line 73: | Line 142: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
====Directories==== | |||
Create, Move, and Delete directories or folders | |||
<syntaxhighlight lang="python"> | |||
import os, shutil, time | |||
import os.path as path | |||
# Create a directory | |||
os.makedirs("new_dir", exist_ok=True) | |||
# or os.makedirs(os.path.dirname("new_dir/my_file.txt"), exist_ok=True) | |||
# Delete an empty directory | |||
os.rmdir(dir_path) | |||
# Delete an empty or non-empty directory | |||
shutil.rmtree(dir_path) | |||
# Wait until it is deleted | |||
while os.path.isdir(dir_path): | |||
time.sleep(0.01) | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 110: | Line 180: | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
import re | import re | ||
my_regex = re.compile(r'height:(\d+)cm') | |||
my_match = my_regex.match("height:33cm"); | |||
print( | print(my_match[1]) | ||
# 33 | # 33 | ||
</syntaxhighlight> | </syntaxhighlight> | ||
;Notes | ;Notes | ||
* <code>re.match</code> will return None if there is no match | * <code>re.match</code> will return None if there is no match | ||
* <code>re.match</code> matches from the beginning of the string | |||
* Use <code>re.search</code> to match from anywhere in the string | |||
* Use <code>re.findall</code> to find all occurrences from anywhere in the string | |||
===Spawning Processes=== | ===Spawning Processes=== | ||
Line 125: | Line 199: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
===Timing Code=== | |||
[https://stackoverflow.com/questions/2866380/how-can-i-time-a-code-segment-for-testing-performance-with-pythons-timeit StackOverflow]<br> | |||
[https://docs.python.org/3/library/time.html Python Time Documentation] | |||
* <code>time.time()</code> return the seconds since epoch as a float | |||
* You can also use timeit to time over several iterations | |||
<syntaxhighlight lang="python"> | |||
import time | |||
t0 = time.time() | |||
code_block | |||
t1 = time.time() | |||
total = t1-t0 | |||
</syntaxhighlight> | |||
===requests=== | |||
Use the requests library to download files and scrape webpages<br> | |||
See [https://www.geeksforgeeks.org/get-post-requests-using-python/ Get and post requests in Python] | |||
====Get Request==== | |||
<syntaxhighlight lang="python"> | |||
import requests | |||
url = R"https://www.google.com" | |||
req = requests.get(url) | |||
req.text | |||
# To save to disk | |||
with open("google.html", "wb") as f: | |||
f.write(req.content) | |||
</syntaxhighlight> | |||
====Post Request==== | |||
<syntaxhighlight lang="python"> | |||
data = {'api_dev_key':API_KEY, | |||
'api_option':'paste', | |||
'api_paste_code':source_code, | |||
'api_paste_format':'python'} | |||
# sending post request and saving response as response object | |||
r = requests.post(url = API_ENDPOINT, data = data) | |||
# extracting response text | |||
pastebin_url = r.text | |||
print("The pastebin URL is:%s"%pastebin_url) | |||
</syntaxhighlight> | |||
====Download a file==== | |||
[https://stackoverflow.com/questions/16694907/download-large-file-in-python-with-requests SO Answer] | |||
<syntaxhighlight lang="python"> | |||
def download_file(url, folder=None, filename=None): | |||
if filename is None: | |||
filename = path.basename(url) | |||
if folder is None: | |||
folder = os.getcwd() | |||
full_path = path.join(folder, filename) | |||
temp_path = path.join(folder, str(uuid.uuid4())) | |||
with requests.get(url, stream=True) as r: | |||
r.raise_for_status() | |||
with open(temp_path, 'wb') as f: | |||
for chunk in r.iter_content(chunk_size=8192): | |||
if chunk: | |||
f.write(chunk) | |||
shutil.move(temp_path, full_path) | |||
return full_path | |||
</syntaxhighlight> | |||
===if main=== | |||
[https://stackoverflow.com/questions/419163/what-does-if-name-main-do What does if __name__ == "__main__" do?] | |||
If you are writing a script with functions you want to be included in other scripts, use <code>__name__</code> to detect if your script is being run or being imported. | If you are writing a script with functions you want to be included in other scripts, use <code>__name__</code> to detect if your script is being run or being imported. | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
if __name__ == "__main__": | if __name__ == "__main__": | ||
# do..something..here | |||
</syntaxhighlight> | |||
===iterators and iterables=== | |||
Iterables include lists, np arrays, tuples. | |||
To create an iterator, pass an iterable to the <code>iter()</code> function. | |||
<syntaxhighlight lang="python"> | |||
my_arr = [1,2,3,4] | |||
my_iter = iter(my_arr) | |||
v1 = my_iter.next() | |||
</syntaxhighlight> | |||
<code>itertools</code> contains many helper functions for interacting with iterables and iterators. | |||
====zip==== | |||
[https://docs.python.org/3/library/functions.html#zip documentation] | |||
zip takes two iterables and combines them into an iterator of tuples | |||
i.e. zip([a1, ...], [b1,...]) = [(a1, b1), ...] | |||
====enumerate==== | |||
[https://docs.python.org/3/library/functions.html#enumerate documentation] | |||
enumerate adds indices to an iterable | |||
i.e. enumerate([a1,...], start=0) = [(0, a1), (1, a2), ...] | |||
====slice==== | |||
<code>itertools.islice</code> will allow you to create a slice from an iterable | |||
<syntaxhighlight lang="python"> | |||
from itertools import islice | |||
import numpy as np | |||
a = np.arange(5) | |||
b = islice(a, 3) | |||
list(b) # [0,1,2] | |||
</syntaxhighlight> | |||
===Exceptions=== | |||
See [https://docs.python.org/3/library/exceptions.html https://docs.python.org/3/library/exceptions.html] | |||
;Raising | |||
<syntaxhighlight lang="python"> | |||
raise ValueError("You have bad inputs") | |||
assert 1=1, "Something is very wrong if 1!=1" | |||
</syntaxhighlight> | |||
;Try Catch/Except | |||
<syntaxhighlight lang="python> | |||
try: | |||
something_which_may_raise() | |||
except AssertError as error: | |||
do_fallback() | |||
raise # Raise the previous error. | |||
else: | |||
do_something_if_no_exception() | |||
finally: | |||
finish_program_and_cleanup() | |||
</syntaxhighlight> | |||
==Classes== | |||
===Static and Class methods=== | |||
See [https://realpython.com/instance-class-and-static-methods-demystified/ realpython] | |||
<syntaxhighlight lang="python"> | |||
class MyClass: | |||
def method(self): | |||
return 'instance method called', self | |||
@classmethod | |||
def classmethod(cls): | |||
return 'class method called', cls | |||
@staticmethod | |||
def staticmethod(): | |||
return 'static method called' | |||
</syntaxhighlight> | |||
;Notes | |||
* That the Google Python style guide discourages use of static methods. | |||
** Class methods should only be used to define alternative constructors (e.g. from_matrix). | |||
==Multithreading== | |||
===threading=== | |||
[https://docs.python.org/3/library/threading.html?highlight=threading#module-threading <code>import threading</code>] | |||
Use <code>threading.Thread</code> to create a thread. | |||
===concurrrency=== | |||
In Python 3.2+, [https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures <code>concurrent.futures</code>] gives you access to thread pools. | |||
<syntaxhighlight lang="python"> | |||
import os | |||
import threading | |||
from concurrent.futures import ThreadPoolExecutor, as_completed | |||
executor = ThreadPoolExecutor(max_workers=os.cpu_count()) | |||
thread_lock = threading.Lock() | |||
total = 0 | |||
def do_something(a, b): | |||
with thread_lock: | |||
total += a + b | |||
return total | |||
my_futures = [] | |||
for i in range(5): | |||
future = executor.submit(do_something, 1, 2+i) | |||
my_futures.append(future) | |||
for future in as_completed(my_futures): | |||
future.result() | |||
executor.shutdown() | |||
</syntaxhighlight> | |||
* <code>len(os.sched_getaffinity(0))</code> returns the number of threads available to the Python process. | |||
* Starting in Python 3.5, if <code>max_workers</code> is none, it defaults to <code>5 * os.cpu_count()</code>. | |||
** <code>os.cpu_count()</code> returns the number of logical CPUs (i.e. threads) | |||
* <code>executor.shutdown()</code> will wait for all jobs to finish but you cannot submit any additional jobs from other threads, after calling shutdown. | |||
* List operations are thread-safe but most other operations will require using a thread lock or semaphore. | |||
==Data Structures== | |||
===Tuples=== | |||
Tuples are immutable lists. This means that have fixed size and fixed elements, though elements themselves may be mutable. | |||
In general, they perform marginally faster than lists so you should use tuples over lists when possible, especially as parameters to functions. | |||
Typically people use tuples as structs, i.e. objects with structure such as coordinates. See [https://stackoverflow.com/questions/626759/whats-the-difference-between-lists-and-tuples StackOverflow: Difference between lists and tuples]. | |||
<syntaxhighlight lang="python"> | |||
# Tuple with one element | |||
m_tuple = (1,) | |||
# Tuple with multiple elements | |||
vals = (1,2,3, "car") | |||
# Return a tuple | |||
def int_divide(a, b): | |||
return a // b, a % b | |||
</syntaxhighlight> | |||
===Lists=== | |||
The default data structure in Python is lists.<br> | |||
A lot of functional programming can be done with lists<br> | |||
<syntaxhighlight lang="python"> | |||
groceries = ["apple", "orange"] | |||
groceries.reverse() | |||
# ["orange", "apple"] | |||
groceries_str = ",".join(groceries) | |||
# "apple,orange" | |||
groceries_str.split(",") | |||
# ["apple", "orange"] | |||
# Note that functions such as map, enumerate, range return enumerable items | |||
# which you can iterate over in a for loop | |||
# You can also convert these to lists by calling list() if necessary | |||
enumerate(groceries) | |||
# [(0, "apple"), (1, "orange")] | |||
</syntaxhighlight> | |||
===Dictionaries=== | |||
Dictionaries are hashmaps in Python<br> | |||
<syntaxhighlight lang="python"> | |||
# Create a dictionary | |||
my_map = {} | |||
# Or | |||
my_map = {1: "a", 2: "b"} | |||
# Check if a key is in a dictionary | |||
# O(1) | |||
1 in my_map | |||
# Check if a value is in a dictionary | |||
# Usually you should have a second dictionary if you need this functionality | |||
# O(n) | |||
'a' in d.values() | |||
# Loop through dictionary | |||
for k in my_map: | |||
print(k) | |||
# With key and value | |||
for k, v in my_map.items(): | |||
print(k, v) | |||
</syntaxhighlight> | |||
==Numpy== | |||
{{main | NumPy}} | |||
See also Cupy which is a numpy interface implemented with CUDA for GPU acceleration. Large speedups can be had for big arrays. | |||
===random=== | |||
Legacy code uses functions from <code>np.random.*</code>. | |||
New code should initialize a rng using <code>np.random.default_rng()</code>. | |||
See [https://numpy.org/doc/stable/reference/random/generator.html Random Generator] for more details. | |||
<syntaxhighlight lang="python"> | |||
import numpy as np | |||
rng = np.random.default_rng() | |||
# Random integer between [0, 6) | |||
rng.integers(0, 6) | |||
# array of 5 random integers | |||
rng.integers(0, 6, size=5) | |||
</syntaxhighlight> | </syntaxhighlight> | ||
==Anaconda== | ==Anaconda== | ||
{{main | Anaconda (Python distribution) }} | |||
How to use Anaconda: | How to use Anaconda: | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
# Create an environment | # Create an environment | ||
conda create -n tf2 python | conda create -n tf2 python=3.6 | ||
# Activate an environment | # Activate an environment | ||
Line 143: | Line 495: | ||
# Change version of Python | # Change version of Python | ||
conda install python=3. | conda install python=3.7 | ||
# Update all packages | |||
conda update --all | |||
</syntaxhighlight> | </syntaxhighlight> | ||
;Documentation | |||
* [https://docs.conda.io/projects/conda/en/latest/commands/install.html <code>conda install</code>] | |||
;Notes | |||
* Use flag <code>--force-reinstall</code> to reinstall packages | |||
==JSON== | |||
[https://docs.python.org/3/library/json.html Documentation] | |||
<syntaxhighlight lang="python"> | |||
import json | |||
# Encode/Stringify (pretty) | |||
json.dumps({}) | |||
# Decode/Parse | |||
json.loads("{}") | |||
# Write to file | |||
with open("my_data.json", "w") as f: | |||
json.dump(my_data, f, indent=2) | |||
# Read from file | |||
with open("my_data.json", "r") as f: | |||
my_data = json.load(f) | |||
</syntaxhighlight> | |||
; Notes | |||
* Using <code>json.dump(data, f)</code> will dump without pretty printing | |||
** Add indent parameter for pretty printing. | |||
==Type Annotations== | |||
Python 3 supports adding type annotations. However it is not enforced at runtime. | |||
You can check types ahead of time using [https://google.github.io/pytype/ pytype]. | |||
<syntaxhighlight lang="python"> | |||
function add_two_values(a: float, b: float) -> float: | |||
return a + b | |||
</syntaxhighlight> | |||
==Images== | |||
===Pillow (PIL)=== | |||
<code>pip install pillow</code> | |||
<syntaxhighlight lang="python"> | |||
from PIL import Image, ImageOps | |||
img = Image.open("my_image.png") | |||
# Converts to int array of shape (H,W,4) | |||
img = np.array(img) | |||
</syntaxhighlight> | |||
* <code>ImageOps.flip(img)</code> - Returns an image flipped across y axis | |||
* <code>ImageOps.mirror(img)</code> - Returns an image flipped across x axis | |||
===Bilinear Interpolation=== | |||
Coped from [https://stackoverflow.com/questions/12729228/simple-efficient-bilinear-interpolation-of-images-in-numpy-and-python https://stackoverflow.com/questions/12729228/simple-efficient-bilinear-interpolation-of-images-in-numpy-and-python] | |||
{{ hidden | Bilinear Interpolation function | | |||
<syntaxhighlight lang="python"> | |||
def bilinear_interpolate(im, x, y): | |||
""" | |||
Basic bilinear interpolation | |||
:param im: | |||
:param x: | |||
:param y: | |||
:return: | |||
""" | |||
x = np.asarray(x) | |||
y = np.asarray(y) | |||
x0 = np.floor(x).astype(int) | |||
x1 = x0 + 1 | |||
y0 = np.floor(y).astype(int) | |||
y1 = y0 + 1 | |||
x0 = np.clip(x0, 0, im.shape[1] - 1) | |||
x1 = np.clip(x1, 0, im.shape[1] - 1) | |||
y0 = np.clip(y0, 0, im.shape[0] - 1) | |||
y1 = np.clip(y1, 0, im.shape[0] - 1) | |||
Ia = im[y0, x0] | |||
Ib = im[y1, x0] | |||
Ic = im[y0, x1] | |||
Id = im[y1, x1] | |||
wa = (x1 - x) * (y1 - y) | |||
wb = (x1 - x) * (y - y0) | |||
wc = (x - x0) * (y1 - y) | |||
wd = (x - x0) * (y - y0) | |||
if len(Ia.shape) > len(wa.shape): | |||
wa = wa[..., np.newaxis] | |||
wb = wb[..., np.newaxis] | |||
wc = wc[..., np.newaxis] | |||
wd = wd[..., np.newaxis] | |||
return wa * Ia + wb * Ib + wc * Ic + wd * Id | |||
</syntaxhighlight> | |||
}} | |||
==Libraries== | ==Libraries== | ||
Other notable libraries. | |||
===Matplotlib=== | ===Matplotlib=== | ||
{{main | Matplotlib}} | {{main | Matplotlib}} | ||
Matplotlib is the main library used for making graphs.<br> | Matplotlib is the main library used for making graphs.<br> | ||
[https://matplotlib.org/examples/ Examples]<br> | [https://matplotlib.org/examples/ Examples]<br> | ||
[https://matplotlib.org/3.1.1/gallery/index.html Gallery] | [https://matplotlib.org/3.1.1/gallery/index.html Gallery] | ||
Alternatively, there are also Python bindings for ggplot2<br> | |||
===configargparse=== | |||
[https://pypi.org/project/ConfigArgParse/ ConfigArgParse] is the same as argparse except it allows you to use config files as args. | |||
<syntaxhighlight lang="python"> | |||
parser = configargparse.ArgParser() | |||
parser.add('-c', '--config', is_config_file=True, help='config file path') | |||
# Parse all args, throw exception on unknown args. | |||
parser.parse_args() | |||
# Parse only known args. | |||
parser.parse_known_args() | |||
</syntaxhighlight> | |||
If you want to use bools without store-true or store-false, you need to define an str2bool function: | |||
[https://stackoverflow.com/questions/15008758/parsing-boolean-values-with-argparse Stack Overflow Answer] | |||
{{ hidden | str2bool | | |||
<syntaxhighlight lang="python"> | |||
def str2bool(val): | |||
"""Converts the string value to a bool. | |||
Args: | |||
val: string representing true or false | |||
Returns: | |||
bool | |||
""" | |||
if isinstance(val, bool): | |||
return val | |||
if val.lower() in ('yes', 'true', 't', 'y', '1'): | |||
return True | |||
elif val.lower() in ('no', 'false', 'f', 'n', '0'): | |||
return False | |||
else: | |||
raise argparse.ArgumentTypeError('Boolean value expected.') | |||
#... | |||
parser.add_argument("--augment", | |||
type=str2bool, | |||
help="Augment", | |||
default=False) | |||
</syntaxhighlight> | |||
}} | |||
[[Category:Programming languages]] |
Latest revision as of 19:39, 18 April 2023
Python is the primary language used for data scientists.
It features some of the most useful scientific computing and machine learning libraries such as Numpy, Tensorflow, and PyTorch.
Installation
Use Anaconda.
- Add
C:\Users\[username]\Anaconda3\Scripts
to your path - Run
conda init
in your bash - Run
conda config --set auto_activate_base false
Usage
How to use Python 3.
pip
Pip is the package manager for python.
Your package requirements should be written to requirements.txt
Install all requirements using pip install -r requirements.txt
Syntax
Ternary Operator
is_nice = True
state = "nice" if is_nice else "not nice"
Lambda Function
lambda x: x * 2
Spread
myfun(*tuple)
For loops
# Normal for loop
for i in range(5):
pass
# 2D for loop
for i, j in np.ndindex((5, 5)):
pass
Strings
String Interpolation
Reference
Python has 3 syntax variations for string interpolation.
name = 'World'
program = 'Python'
print(f'Hello {name}! This is {program}')
print("%s %s" % ('Hello','World',))
name = 'world'
program ='python'
print('Hello {}! This is {}.'.format(name, program))
print('Hello {name}! This is {program}.'.format(name=name, program=program))
# Format to two decimal places
print(f"Accuracy: {accuracy:.02f}%")
# Format an int to 2 digits
print(f"Size: {size:02}%")
Arrays
Use Numpy to provide array functionality
Array Indexing
Numpy Indexing Numpy has very powerful indexing. See the above reference.
Filesystem
Paths
Use os.path
import os.path as path
my_file = path.join("folder_1", "my_great_dataset.tar.gz")
# "folder_1\\my_great_dataset.tar.gz"
# Get the filename with extension
filename = path.basename(my_file)
# "my_great_dataset.tar.gz"
# Get the filename without extension
filename_no_ext = path.splitext(filename)[0]
# Note that splitext returns ("my_great_dataset.tar", ".gz")
If using Python >=3.4, you also have pathlib
from pathlib import Path
p = Path("my_folder")
# Join paths
pp = Path(p, "files.tar.gz")
pp.suffix # returns ".gz"
pp.suffixes # returns [".tar", ".gz"]
pp.name # returns "files.tar.gz"
pp.parent # returns "my_folder"
- Notes
- One annoyance with
pathlib.Path
is that you need to convert things to strings manually- This can be done with
str
,.resolve()
, oros.fspath()
- This can be done with
- "No really, pathlib is great" by Trey Hunger
List all files in a folder
gaze_directory = "Gaze_txt_files"
# List of folders in root folder
gaze_folders = [path.join(gaze_directory, x) for x in os.listdir(gaze_directory)]
# List of files 2 folders down
gaze_files = [path.join(x, y) for x in gaze_folders for y in os.listdir(x)]
See also glob.
Read/Write entire text file into a list
Reading
[1]
with open('C:/path/numbers.txt') as f:
lines = f.read().splitlines()
Writing
[2]
with open('your_file.txt', 'w') as f:
f.write("\n".join(my_list))
Directories
Create, Move, and Delete directories or folders
import os, shutil, time
import os.path as path
# Create a directory
os.makedirs("new_dir", exist_ok=True)
# or os.makedirs(os.path.dirname("new_dir/my_file.txt"), exist_ok=True)
# Delete an empty directory
os.rmdir(dir_path)
# Delete an empty or non-empty directory
shutil.rmtree(dir_path)
# Wait until it is deleted
while os.path.isdir(dir_path):
time.sleep(0.01)
Copying or moving a file or folder
import shutil
# Copy a file
shutil.copy2('original.txt', 'duplicate.txt')
# Move a file
shutil.move('original.txt', 'my_folder/original.txt')
Regular Expressions (Regex)
import re
my_regex = re.compile(r'height:(\d+)cm')
my_match = my_regex.match("height:33cm");
print(my_match[1])
# 33
- Notes
re.match
will return None if there is no matchre.match
matches from the beginning of the string- Use
re.search
to match from anywhere in the string - Use
re.findall
to find all occurrences from anywhere in the string
Spawning Processes
Use subprocess to spawn other programs.
import subprocess
subprocess.run(["ls", "-l"], cwd="/")
Timing Code
StackOverflow
Python Time Documentation
time.time()
return the seconds since epoch as a float- You can also use timeit to time over several iterations
import time
t0 = time.time()
code_block
t1 = time.time()
total = t1-t0
requests
Use the requests library to download files and scrape webpages
See Get and post requests in Python
Get Request
import requests
url = R"https://www.google.com"
req = requests.get(url)
req.text
# To save to disk
with open("google.html", "wb") as f:
f.write(req.content)
Post Request
data = {'api_dev_key':API_KEY,
'api_option':'paste',
'api_paste_code':source_code,
'api_paste_format':'python'}
# sending post request and saving response as response object
r = requests.post(url = API_ENDPOINT, data = data)
# extracting response text
pastebin_url = r.text
print("The pastebin URL is:%s"%pastebin_url)
Download a file
def download_file(url, folder=None, filename=None):
if filename is None:
filename = path.basename(url)
if folder is None:
folder = os.getcwd()
full_path = path.join(folder, filename)
temp_path = path.join(folder, str(uuid.uuid4()))
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open(temp_path, 'wb') as f:
for chunk in r.iter_content(chunk_size=8192):
if chunk:
f.write(chunk)
shutil.move(temp_path, full_path)
return full_path
if main
What does if __name__ == "__main__" do?
If you are writing a script with functions you want to be included in other scripts, use __name__
to detect if your script is being run or being imported.
if __name__ == "__main__":
# do..something..here
iterators and iterables
Iterables include lists, np arrays, tuples.
To create an iterator, pass an iterable to the iter()
function.
my_arr = [1,2,3,4]
my_iter = iter(my_arr)
v1 = my_iter.next()
itertools
contains many helper functions for interacting with iterables and iterators.
zip
zip takes two iterables and combines them into an iterator of tuples
i.e. zip([a1, ...], [b1,...]) = [(a1, b1), ...]
enumerate
enumerate adds indices to an iterable
i.e. enumerate([a1,...], start=0) = [(0, a1), (1, a2), ...]
slice
itertools.islice
will allow you to create a slice from an iterable
from itertools import islice
import numpy as np
a = np.arange(5)
b = islice(a, 3)
list(b) # [0,1,2]
Exceptions
See https://docs.python.org/3/library/exceptions.html
- Raising
raise ValueError("You have bad inputs")
assert 1=1, "Something is very wrong if 1!=1"
- Try Catch/Except
try:
something_which_may_raise()
except AssertError as error:
do_fallback()
raise # Raise the previous error.
else:
do_something_if_no_exception()
finally:
finish_program_and_cleanup()
Classes
Static and Class methods
See realpython
class MyClass:
def method(self):
return 'instance method called', self
@classmethod
def classmethod(cls):
return 'class method called', cls
@staticmethod
def staticmethod():
return 'static method called'
- Notes
- That the Google Python style guide discourages use of static methods.
- Class methods should only be used to define alternative constructors (e.g. from_matrix).
Multithreading
threading
Use threading.Thread
to create a thread.
concurrrency
In Python 3.2+, concurrent.futures
gives you access to thread pools.
import os
import threading
from concurrent.futures import ThreadPoolExecutor, as_completed
executor = ThreadPoolExecutor(max_workers=os.cpu_count())
thread_lock = threading.Lock()
total = 0
def do_something(a, b):
with thread_lock:
total += a + b
return total
my_futures = []
for i in range(5):
future = executor.submit(do_something, 1, 2+i)
my_futures.append(future)
for future in as_completed(my_futures):
future.result()
executor.shutdown()
len(os.sched_getaffinity(0))
returns the number of threads available to the Python process.- Starting in Python 3.5, if
max_workers
is none, it defaults to5 * os.cpu_count()
.os.cpu_count()
returns the number of logical CPUs (i.e. threads)
executor.shutdown()
will wait for all jobs to finish but you cannot submit any additional jobs from other threads, after calling shutdown.- List operations are thread-safe but most other operations will require using a thread lock or semaphore.
Data Structures
Tuples
Tuples are immutable lists. This means that have fixed size and fixed elements, though elements themselves may be mutable. In general, they perform marginally faster than lists so you should use tuples over lists when possible, especially as parameters to functions.
Typically people use tuples as structs, i.e. objects with structure such as coordinates. See StackOverflow: Difference between lists and tuples.
# Tuple with one element
m_tuple = (1,)
# Tuple with multiple elements
vals = (1,2,3, "car")
# Return a tuple
def int_divide(a, b):
return a // b, a % b
Lists
The default data structure in Python is lists.
A lot of functional programming can be done with lists
groceries = ["apple", "orange"]
groceries.reverse()
# ["orange", "apple"]
groceries_str = ",".join(groceries)
# "apple,orange"
groceries_str.split(",")
# ["apple", "orange"]
# Note that functions such as map, enumerate, range return enumerable items
# which you can iterate over in a for loop
# You can also convert these to lists by calling list() if necessary
enumerate(groceries)
# [(0, "apple"), (1, "orange")]
Dictionaries
Dictionaries are hashmaps in Python
# Create a dictionary
my_map = {}
# Or
my_map = {1: "a", 2: "b"}
# Check if a key is in a dictionary
# O(1)
1 in my_map
# Check if a value is in a dictionary
# Usually you should have a second dictionary if you need this functionality
# O(n)
'a' in d.values()
# Loop through dictionary
for k in my_map:
print(k)
# With key and value
for k, v in my_map.items():
print(k, v)
Numpy
See also Cupy which is a numpy interface implemented with CUDA for GPU acceleration. Large speedups can be had for big arrays.
random
Legacy code uses functions from np.random.*
.
New code should initialize a rng using np.random.default_rng()
.
See Random Generator for more details.
import numpy as np
rng = np.random.default_rng()
# Random integer between [0, 6)
rng.integers(0, 6)
# array of 5 random integers
rng.integers(0, 6, size=5)
Anaconda
How to use Anaconda:
# Create an environment
conda create -n tf2 python=3.6
# Activate an environment
conda activate tf2
# Change version of Python
conda install python=3.7
# Update all packages
conda update --all
- Documentation
- Notes
- Use flag
--force-reinstall
to reinstall packages
JSON
import json
# Encode/Stringify (pretty)
json.dumps({})
# Decode/Parse
json.loads("{}")
# Write to file
with open("my_data.json", "w") as f:
json.dump(my_data, f, indent=2)
# Read from file
with open("my_data.json", "r") as f:
my_data = json.load(f)
- Notes
- Using
json.dump(data, f)
will dump without pretty printing- Add indent parameter for pretty printing.
Type Annotations
Python 3 supports adding type annotations. However it is not enforced at runtime.
You can check types ahead of time using pytype.
function add_two_values(a: float, b: float) -> float:
return a + b
Images
Pillow (PIL)
pip install pillow
from PIL import Image, ImageOps
img = Image.open("my_image.png")
# Converts to int array of shape (H,W,4)
img = np.array(img)
ImageOps.flip(img)
- Returns an image flipped across y axisImageOps.mirror(img)
- Returns an image flipped across x axis
Bilinear Interpolation
def bilinear_interpolate(im, x, y):
"""
Basic bilinear interpolation
:param im:
:param x:
:param y:
:return:
"""
x = np.asarray(x)
y = np.asarray(y)
x0 = np.floor(x).astype(int)
x1 = x0 + 1
y0 = np.floor(y).astype(int)
y1 = y0 + 1
x0 = np.clip(x0, 0, im.shape[1] - 1)
x1 = np.clip(x1, 0, im.shape[1] - 1)
y0 = np.clip(y0, 0, im.shape[0] - 1)
y1 = np.clip(y1, 0, im.shape[0] - 1)
Ia = im[y0, x0]
Ib = im[y1, x0]
Ic = im[y0, x1]
Id = im[y1, x1]
wa = (x1 - x) * (y1 - y)
wb = (x1 - x) * (y - y0)
wc = (x - x0) * (y1 - y)
wd = (x - x0) * (y - y0)
<br />
if len(Ia.shape) > len(wa.shape):
wa = wa[..., np.newaxis]
wb = wb[..., np.newaxis]
wc = wc[..., np.newaxis]
wd = wd[..., np.newaxis]
return wa * Ia + wb * Ib + wc * Ic + wd * Id
Libraries
Other notable libraries.
Matplotlib
Matplotlib is the main library used for making graphs.
Examples
Gallery
Alternatively, there are also Python bindings for ggplot2
configargparse
ConfigArgParse is the same as argparse except it allows you to use config files as args.
parser = configargparse.ArgParser()
parser.add('-c', '--config', is_config_file=True, help='config file path')
# Parse all args, throw exception on unknown args.
parser.parse_args()
# Parse only known args.
parser.parse_known_args()
If you want to use bools without store-true or store-false, you need to define an str2bool function: Stack Overflow Answer
def str2bool(val):
"""Converts the string value to a bool.
Args:
val: string representing true or false
Returns:
bool
"""
if isinstance(val, bool):
return val
if val.lower() in ('yes', 'true', 't', 'y', '1'):
return True
elif val.lower() in ('no', 'false', 'f', 'n', '0'):
return False
else:
raise argparse.ArgumentTypeError('Boolean value expected.')
#...
parser.add_argument("--augment",
type=str2bool,
help="Augment",
default=False)