Python: Difference between revisions
No edit summary |
|||
(110 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
__FORCETOC__ | __FORCETOC__ | ||
Python is the primary language used for data scientists.<br> | |||
It features some of the most useful scientific computing and machine learning libraries such as Numpy, Tensorflow, and PyTorch. | |||
==Installation== | ==Installation== | ||
Use [http://anaconda.com Anaconda]. | Use [http://anaconda.com Anaconda]. | ||
* Add <code>C:\Users\[username]\Anaconda3\Scripts</code> to your path | |||
* Run <code>conda init</code> in your bash | |||
* Run <code>conda config --set auto_activate_base false</code> | |||
== | ==Usage== | ||
How to use Python 3. | How to use Python 3. | ||
===Ternary Operator=== | ===pip=== | ||
Pip is the package manager for python.<br> | |||
Your package requirements should be written to <code>requirements.txt</code><br> | |||
Install all requirements using <code>pip install -r requirements.txt</code> | |||
===Syntax=== | |||
====Ternary Operator==== | |||
[http://book.pythontips.com/en/latest/ternary_operators.html Reference] | [http://book.pythontips.com/en/latest/ternary_operators.html Reference] | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
is_nice = True | is_nice = True | ||
state = "nice" if is_nice else "not nice" | state = "nice" if is_nice else "not nice" | ||
</syntaxhighlight> | |||
====Lambda Function==== | |||
<syntaxhighlight lang="python"> | |||
lambda x: x * 2 | |||
</syntaxhighlight> | |||
====Spread==== | |||
[https://stackoverflow.com/questions/1993727/expanding-tuples-into-arguments Reference] | |||
<syntaxhighlight lang="python"> | |||
myfun(*tuple) | |||
</syntaxhighlight> | |||
====For loops==== | |||
<syntaxhighlight lang="python"> | |||
# Normal for loop | |||
for i in range(5): | |||
pass | |||
# 2D for loop | |||
for i, j in np.ndindex((5, 5)): | |||
pass | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 23: | Line 58: | ||
print(f'Hello {name}! This is {program}') | print(f'Hello {name}! This is {program}') | ||
print("%s %s" %('Hello','World',)) | print("%s %s" % ('Hello','World',)) | ||
name = 'world' | name = 'world' | ||
program ='python' | program ='python' | ||
print('Hello {name}!This is{program}.'.format(name=name,program=program)) | print('Hello {}! This is {}.'.format(name, program)) | ||
print('Hello {name}! This is {program}.'.format(name=name, program=program)) | |||
# Format to two decimal places | |||
print(f"Accuracy: {accuracy:.02f}%") | |||
# Format an int to 2 digits | |||
print(f"Size: {size:02}%") | |||
</syntaxhighlight> | |||
===Arrays=== | |||
Use Numpy to provide array functionality | |||
====Array Indexing==== | |||
[https://numpy.org/doc/stable/user/basics.indexing.html Numpy Indexing] | |||
Numpy has very powerful indexing. See the above reference. | |||
===Filesystem=== | |||
====Paths==== | |||
Use [https://docs.python.org/3/library/os.path.html <code>os.path</code>] | |||
<syntaxhighlight lang="python"> | |||
import os.path as path | |||
my_file = path.join("folder_1", "my_great_dataset.tar.gz") | |||
# "folder_1\\my_great_dataset.tar.gz" | |||
# Get the filename with extension | |||
filename = path.basename(my_file) | |||
# "my_great_dataset.tar.gz" | |||
# Get the filename without extension | |||
filename_no_ext = path.splitext(filename)[0] | |||
# Note that splitext returns ("my_great_dataset.tar", ".gz") | |||
</syntaxhighlight> | </syntaxhighlight> | ||
= | If using Python >=3.4, you also have [https://docs.python.org/3/library/pathlib.html <code>pathlib</code>] | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
from pathlib import Path | |||
p = Path("my_folder") | |||
# Join paths | |||
pp = Path(p, "files.tar.gz") | |||
pp.suffix # returns ".gz" | |||
pp.suffixes # returns [".tar", ".gz"] | |||
pp.name # returns "files.tar.gz" | |||
pp.parent # returns "my_folder" | |||
</syntaxhighlight> | </syntaxhighlight> | ||
;Notes | |||
* One annoyance with <code>pathlib.Path</code> is that you need to convert things to strings manually | |||
** This can be done with <code>str</code>, <code>.resolve()</code>, or <code>os.fspath()</code> | |||
* [https://treyhunner.com/2019/01/no-really-pathlib-is-great/ "No really, pathlib is great" by Trey Hunger] | |||
====List all files in a folder==== | ====List all files in a folder==== | ||
[https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory Reference] | [https://stackoverflow.com/questions/3207219/how-do-i-list-all-files-of-a-directory Reference] | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
gaze_directory = "Gaze_txt_files" | |||
# List of folders in root folder | # List of folders in root folder | ||
gaze_folders = [path.join(gaze_directory, x) for x in os.listdir(gaze_directory)] | |||
# List of files 2 folders down | # List of files 2 folders down | ||
gaze_files = [path.join(x, y) for x in gaze_folders for y in os.listdir(x)] | |||
</syntaxhighlight> | </syntaxhighlight> | ||
See also glob. | |||
====Read/Write entire text file into a list==== | ====Read/Write entire text file into a list==== | ||
Line 60: | Line 142: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
====Directories==== | |||
Create, Move, and Delete directories or folders | |||
<syntaxhighlight lang="python"> | |||
import os, shutil, time | |||
import os.path as path | |||
# Create a directory | |||
os.makedirs("new_dir", exist_ok=True) | |||
# or os.makedirs(os.path.dirname("new_dir/my_file.txt"), exist_ok=True) | |||
# Delete an empty directory | |||
os.rmdir(dir_path) | |||
# Delete an empty or non-empty directory | |||
shutil.rmtree(dir_path) | |||
# Wait until it is deleted | |||
while os.path.isdir(dir_path): | |||
time.sleep(0.01) | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 97: | Line 180: | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
import re | import re | ||
my_regex = re.compile(r'height:(\d+)cm') | |||
my_match = my_regex.match("height:33cm"); | |||
print( | print(my_match[1]) | ||
# 33 | # 33 | ||
</syntaxhighlight> | </syntaxhighlight> | ||
;Notes | |||
* <code>re.match</code> will return None if there is no match | |||
* <code>re.match</code> matches from the beginning of the string | |||
* Use <code>re.search</code> to match from anywhere in the string | |||
* Use <code>re.findall</code> to find all occurrences from anywhere in the string | |||
===Spawning Processes=== | ===Spawning Processes=== | ||
Line 110: | Line 199: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
===Timing Code=== | |||
[https://stackoverflow.com/questions/2866380/how-can-i-time-a-code-segment-for-testing-performance-with-pythons-timeit StackOverflow]<br> | |||
[https://docs.python.org/3/library/time.html Python Time Documentation] | |||
* <code>time.time()</code> return the seconds since epoch as a float | |||
* You can also use timeit to time over several iterations | |||
<syntaxhighlight lang="python"> | |||
import time | |||
t0 = time.time() | |||
code_block | |||
t1 = time.time() | |||
total = t1-t0 | |||
</syntaxhighlight> | |||
===requests=== | |||
Use the requests library to download files and scrape webpages<br> | |||
See [https://www.geeksforgeeks.org/get-post-requests-using-python/ Get and post requests in Python] | |||
====Get Request==== | |||
<syntaxhighlight lang="python"> | |||
import requests | |||
url = R"https://www.google.com" | |||
req = requests.get(url) | |||
req.text | |||
# To save to disk | |||
with open("google.html", "wb") as f: | |||
f.write(req.content) | |||
</syntaxhighlight> | |||
====Post Request==== | |||
<syntaxhighlight lang="python"> | |||
data = {'api_dev_key':API_KEY, | |||
'api_option':'paste', | |||
'api_paste_code':source_code, | |||
'api_paste_format':'python'} | |||
# sending post request and saving response as response object | |||
r = requests.post(url = API_ENDPOINT, data = data) | |||
# extracting response text | |||
pastebin_url = r.text | |||
print("The pastebin URL is:%s"%pastebin_url) | |||
</syntaxhighlight> | |||
====Download a file==== | |||
[https://stackoverflow.com/questions/16694907/download-large-file-in-python-with-requests SO Answer] | |||
<syntaxhighlight lang="python"> | |||
def download_file(url, folder=None, filename=None): | |||
if filename is None: | |||
filename = path.basename(url) | |||
if folder is None: | |||
folder = os.getcwd() | |||
full_path = path.join(folder, filename) | |||
temp_path = path.join(folder, str(uuid.uuid4())) | |||
with requests.get(url, stream=True) as r: | |||
r.raise_for_status() | |||
with open(temp_path, 'wb') as f: | |||
for chunk in r.iter_content(chunk_size=8192): | |||
if chunk: | |||
f.write(chunk) | |||
shutil.move(temp_path, full_path) | |||
return full_path | |||
</syntaxhighlight> | |||
===if main=== | |||
[https://stackoverflow.com/questions/419163/what-does-if-name-main-do What does if __name__ == "__main__" do?] | |||
If you are writing a script with functions you want to be included in other scripts, use <code>__name__</code> to detect if your script is being run or being imported. | If you are writing a script with functions you want to be included in other scripts, use <code>__name__</code> to detect if your script is being run or being imported. | ||
<syntaxhighlight lang="python"> | <syntaxhighlight lang="python"> | ||
if __name__ == "__main__": | if __name__ == "__main__": | ||
# do..something..here | |||
</syntaxhighlight> | |||
===iterators and iterables=== | |||
Iterables include lists, np arrays, tuples. | |||
To create an iterator, pass an iterable to the <code>iter()</code> function. | |||
<syntaxhighlight lang="python"> | |||
my_arr = [1,2,3,4] | |||
my_iter = iter(my_arr) | |||
v1 = my_iter.next() | |||
</syntaxhighlight> | |||
<code>itertools</code> contains many helper functions for interacting with iterables and iterators. | |||
====zip==== | |||
[https://docs.python.org/3/library/functions.html#zip documentation] | |||
zip takes two iterables and combines them into an iterator of tuples | |||
i.e. zip([a1, ...], [b1,...]) = [(a1, b1), ...] | |||
====enumerate==== | |||
[https://docs.python.org/3/library/functions.html#enumerate documentation] | |||
enumerate adds indices to an iterable | |||
i.e. enumerate([a1,...], start=0) = [(0, a1), (1, a2), ...] | |||
====slice==== | |||
<code>itertools.islice</code> will allow you to create a slice from an iterable | |||
<syntaxhighlight lang="python"> | |||
from itertools import islice | |||
import numpy as np | |||
a = np.arange(5) | |||
b = islice(a, 3) | |||
list(b) # [0,1,2] | |||
</syntaxhighlight> | |||
===Exceptions=== | |||
See [https://docs.python.org/3/library/exceptions.html https://docs.python.org/3/library/exceptions.html] | |||
;Raising | |||
<syntaxhighlight lang="python"> | |||
raise ValueError("You have bad inputs") | |||
assert 1=1, "Something is very wrong if 1!=1" | |||
</syntaxhighlight> | |||
;Try Catch/Except | |||
<syntaxhighlight lang="python> | |||
try: | |||
something_which_may_raise() | |||
except AssertError as error: | |||
do_fallback() | |||
raise # Raise the previous error. | |||
else: | |||
do_something_if_no_exception() | |||
finally: | |||
finish_program_and_cleanup() | |||
</syntaxhighlight> | |||
==Classes== | |||
===Static and Class methods=== | |||
See [https://realpython.com/instance-class-and-static-methods-demystified/ realpython] | |||
<syntaxhighlight lang="python"> | |||
class MyClass: | |||
def method(self): | |||
return 'instance method called', self | |||
@classmethod | |||
def classmethod(cls): | |||
return 'class method called', cls | |||
@staticmethod | |||
def staticmethod(): | |||
return 'static method called' | |||
</syntaxhighlight> | |||
;Notes | |||
* That the Google Python style guide discourages use of static methods. | |||
** Class methods should only be used to define alternative constructors (e.g. from_matrix). | |||
==Multithreading== | |||
===threading=== | |||
[https://docs.python.org/3/library/threading.html?highlight=threading#module-threading <code>import threading</code>] | |||
Use <code>threading.Thread</code> to create a thread. | |||
===concurrrency=== | |||
In Python 3.2+, [https://docs.python.org/3/library/concurrent.futures.html#module-concurrent.futures <code>concurrent.futures</code>] gives you access to thread pools. | |||
<syntaxhighlight lang="python"> | |||
import os | |||
import threading | |||
from concurrent.futures import ThreadPoolExecutor, as_completed | |||
executor = ThreadPoolExecutor(max_workers=os.cpu_count()) | |||
thread_lock = threading.Lock() | |||
total = 0 | |||
def do_something(a, b): | |||
with thread_lock: | |||
total += a + b | |||
return total | |||
my_futures = [] | |||
for i in range(5): | |||
future = executor.submit(do_something, 1, 2+i) | |||
my_futures.append(future) | |||
for future in as_completed(my_futures): | |||
future.result() | |||
executor.shutdown() | |||
</syntaxhighlight> | |||
* <code>len(os.sched_getaffinity(0))</code> returns the number of threads available to the Python process. | |||
* Starting in Python 3.5, if <code>max_workers</code> is none, it defaults to <code>5 * os.cpu_count()</code>. | |||
** <code>os.cpu_count()</code> returns the number of logical CPUs (i.e. threads) | |||
* <code>executor.shutdown()</code> will wait for all jobs to finish but you cannot submit any additional jobs from other threads, after calling shutdown. | |||
* List operations are thread-safe but most other operations will require using a thread lock or semaphore. | |||
==Data Structures== | |||
===Tuples=== | |||
Tuples are immutable lists. This means that have fixed size and fixed elements, though elements themselves may be mutable. | |||
In general, they perform marginally faster than lists so you should use tuples over lists when possible, especially as parameters to functions. | |||
Typically people use tuples as structs, i.e. objects with structure such as coordinates. See [https://stackoverflow.com/questions/626759/whats-the-difference-between-lists-and-tuples StackOverflow: Difference between lists and tuples]. | |||
<syntaxhighlight lang="python"> | |||
# Tuple with one element | |||
m_tuple = (1,) | |||
# Tuple with multiple elements | |||
vals = (1,2,3, "car") | |||
# Return a tuple | |||
def int_divide(a, b): | |||
return a // b, a % b | |||
</syntaxhighlight> | |||
===Lists=== | |||
The default data structure in Python is lists.<br> | |||
A lot of functional programming can be done with lists<br> | |||
<syntaxhighlight lang="python"> | |||
groceries = ["apple", "orange"] | |||
groceries.reverse() | |||
# ["orange", "apple"] | |||
groceries_str = ",".join(groceries) | |||
# "apple,orange" | |||
groceries_str.split(",") | |||
# ["apple", "orange"] | |||
# Note that functions such as map, enumerate, range return enumerable items | |||
# which you can iterate over in a for loop | |||
# You can also convert these to lists by calling list() if necessary | |||
enumerate(groceries) | |||
# [(0, "apple"), (1, "orange")] | |||
</syntaxhighlight> | |||
===Dictionaries=== | |||
Dictionaries are hashmaps in Python<br> | |||
<syntaxhighlight lang="python"> | |||
# Create a dictionary | |||
my_map = {} | |||
# Or | |||
my_map = {1: "a", 2: "b"} | |||
# Check if a key is in a dictionary | |||
# O(1) | |||
1 in my_map | |||
# Check if a value is in a dictionary | |||
# Usually you should have a second dictionary if you need this functionality | |||
# O(n) | |||
'a' in d.values() | |||
# Loop through dictionary | |||
for k in my_map: | |||
print(k) | |||
# With key and value | |||
for k, v in my_map.items(): | |||
print(k, v) | |||
</syntaxhighlight> | |||
==Numpy== | |||
{{main | NumPy}} | |||
See also Cupy which is a numpy interface implemented with CUDA for GPU acceleration. Large speedups can be had for big arrays. | |||
===random=== | |||
Legacy code uses functions from <code>np.random.*</code>. | |||
New code should initialize a rng using <code>np.random.default_rng()</code>. | |||
See [https://numpy.org/doc/stable/reference/random/generator.html Random Generator] for more details. | |||
<syntaxhighlight lang="python"> | |||
import numpy as np | |||
rng = np.random.default_rng() | |||
# Random integer between [0, 6) | |||
rng.integers(0, 6) | |||
# array of 5 random integers | |||
rng.integers(0, 6, size=5) | |||
</syntaxhighlight> | </syntaxhighlight> | ||
==Anaconda== | ==Anaconda== | ||
{{main | Anaconda (Python distribution) }} | |||
How to use Anaconda: | How to use Anaconda: | ||
<syntaxhighlight lang="bash"> | <syntaxhighlight lang="bash"> | ||
# Create an environment | # Create an environment | ||
conda create -n tf2 python | conda create -n tf2 python=3.6 | ||
# Activate an environment | # Activate an environment | ||
Line 128: | Line 495: | ||
# Change version of Python | # Change version of Python | ||
conda install python=3. | conda install python=3.7 | ||
# Update all packages | |||
conda update --all | |||
</syntaxhighlight> | </syntaxhighlight> | ||
;Documentation | |||
* [https://docs.conda.io/projects/conda/en/latest/commands/install.html <code>conda install</code>] | |||
;Notes | |||
* Use flag <code>--force-reinstall</code> to reinstall packages | |||
==JSON== | |||
[https://docs.python.org/3/library/json.html Documentation] | |||
<syntaxhighlight lang="python"> | |||
import json | |||
# Encode/Stringify (pretty) | |||
json.dumps({}) | |||
# Decode/Parse | |||
json.loads("{}") | |||
# Write to file | |||
with open("my_data.json", "w") as f: | |||
json.dump(my_data, f, indent=2) | |||
# Read from file | |||
with open("my_data.json", "r") as f: | |||
my_data = json.load(f) | |||
</syntaxhighlight> | |||
; Notes | |||
* Using <code>json.dump(data, f)</code> will dump without pretty printing | |||
** Add indent parameter for pretty printing. | |||
==Type Annotations== | |||
Python 3 supports adding type annotations. However it is not enforced at runtime. | |||
You can check types ahead of time using [https://google.github.io/pytype/ pytype]. | |||
<syntaxhighlight lang="python"> | |||
function add_two_values(a: float, b: float) -> float: | |||
return a + b | |||
</syntaxhighlight> | |||
==Images== | |||
===Pillow (PIL)=== | |||
<code>pip install pillow</code> | |||
<syntaxhighlight lang="python"> | |||
from PIL import Image, ImageOps | |||
img = Image.open("my_image.png") | |||
# Converts to int array of shape (H,W,4) | |||
img = np.array(img) | |||
</syntaxhighlight> | |||
* <code>ImageOps.flip(img)</code> - Returns an image flipped across y axis | |||
* <code>ImageOps.mirror(img)</code> - Returns an image flipped across x axis | |||
===Bilinear Interpolation=== | |||
Coped from [https://stackoverflow.com/questions/12729228/simple-efficient-bilinear-interpolation-of-images-in-numpy-and-python https://stackoverflow.com/questions/12729228/simple-efficient-bilinear-interpolation-of-images-in-numpy-and-python] | |||
{{ hidden | Bilinear Interpolation function | | |||
<syntaxhighlight lang="python"> | |||
def bilinear_interpolate(im, x, y): | |||
""" | |||
Basic bilinear interpolation | |||
:param im: | |||
:param x: | |||
:param y: | |||
:return: | |||
""" | |||
x = np.asarray(x) | |||
y = np.asarray(y) | |||
x0 = np.floor(x).astype(int) | |||
x1 = x0 + 1 | |||
y0 = np.floor(y).astype(int) | |||
y1 = y0 + 1 | |||
x0 = np.clip(x0, 0, im.shape[1] - 1) | |||
x1 = np.clip(x1, 0, im.shape[1] - 1) | |||
y0 = np.clip(y0, 0, im.shape[0] - 1) | |||
y1 = np.clip(y1, 0, im.shape[0] - 1) | |||
Ia = im[y0, x0] | |||
Ib = im[y1, x0] | |||
Ic = im[y0, x1] | |||
Id = im[y1, x1] | |||
wa = (x1 - x) * (y1 - y) | |||
wb = (x1 - x) * (y - y0) | |||
wc = (x - x0) * (y1 - y) | |||
wd = (x - x0) * (y - y0) | |||
if len(Ia.shape) > len(wa.shape): | |||
wa = wa[..., np.newaxis] | |||
wb = wb[..., np.newaxis] | |||
wc = wc[..., np.newaxis] | |||
wd = wd[..., np.newaxis] | |||
return wa * Ia + wb * Ib + wc * Ic + wd * Id | |||
</syntaxhighlight> | |||
}} | |||
==Libraries== | ==Libraries== | ||
Other notable libraries. | |||
===Matplotlib=== | ===Matplotlib=== | ||
{{main | Matplotlib}} | |||
Matplotlib is the main library used for making graphs.<br> | Matplotlib is the main library used for making graphs.<br> | ||
[https://matplotlib.org/examples/ Examples]<br> | [https://matplotlib.org/examples/ Examples]<br> | ||
[https://matplotlib.org/3.1.1/gallery/index.html Gallery] | [https://matplotlib.org/3.1.1/gallery/index.html Gallery] | ||
Alternatively, there are also Python bindings for ggplot2<br> | |||
===configargparse=== | |||
[https://pypi.org/project/ConfigArgParse/ ConfigArgParse] is the same as argparse except it allows you to use config files as args. | |||
<syntaxhighlight lang="python"> | |||
parser = configargparse.ArgParser() | |||
parser.add('-c', '--config', is_config_file=True, help='config file path') | |||
# Parse all args, throw exception on unknown args. | |||
parser.parse_args() | |||
# Parse only known args. | |||
parser.parse_known_args() | |||
</syntaxhighlight> | |||
If you want to use bools without store-true or store-false, you need to define an str2bool function: | |||
[https://stackoverflow.com/questions/15008758/parsing-boolean-values-with-argparse Stack Overflow Answer] | |||
{{ hidden | str2bool | | |||
<syntaxhighlight lang="python"> | |||
def str2bool(val): | |||
"""Converts the string value to a bool. | |||
Args: | |||
val: string representing true or false | |||
Returns: | |||
bool | |||
""" | |||
if isinstance(val, bool): | |||
return val | |||
if val.lower() in ('yes', 'true', 't', 'y', '1'): | |||
return True | |||
elif val.lower() in ('no', 'false', 'f', 'n', '0'): | |||
return False | |||
else: | |||
raise argparse.ArgumentTypeError('Boolean value expected.') | |||
#... | |||
parser.add_argument("--augment", | |||
type=str2bool, | |||
help="Augment", | |||
default=False) | |||
</syntaxhighlight> | |||
}} | |||
[[Category:Programming languages]] |