{ "cells": [ { "cell_type": "markdown", "id": "44056652", "metadata": {}, "source": [ "# Week V - Vectorization, profiling and non-uniform random distributions" ] }, { "cell_type": "markdown", "id": "49ca7c09", "metadata": {}, "source": [ "## Vectorization\n", "\n", "#### Definition \n", "\n", "* In the context of Python programming, the term **vectorization** is used in a broad sense and typically referes to the practice of avoiding for loops in favour of vector products and linear algebra operations, often through calls to dedicated numerical libraries.\n", "* More strictly speaking, **vectorization** is the redesign of an algorithm to take advantage of the vector processing capability of the hardware.\n", "\n", "#### Faster code through vectorization\n", "\n", "In practice, avoiding for loops will make your Python code faster because:\n", "* you remove/lower the overhead of the loop due to the interpreter\n", "* you use dedicated libraries which are very fast, owing to:\n", " * efficient algorithms\n", " * efficient implementations of those algorithms \n", " * written in a low-level language and compiled (implementations are written in one or more of Fortran, C, assembly, etc.). \n", " * exploit vector processing capabilities such as SIMD (Single Instructions Multiple Data)\n", " * might be optimized for specific hardware\n", "(different CPUs architectures, GPUs, ...)\n", "\n", "For instance, NumPy wraps BLAS and LAPACK.\n", "\n", "Note that:\n", "* Vectorization also applies very much to compiled languages, but it becomes often unavoidable for interpreted languages.\n", "* Some degree of auto-vectorization can be introduced by the compiler with appropriate flags (check compiler documentations, e.g. -O3 flag), but often effective vectorization requires substantial expertise in algorithms and low-level languages -> *use existing and established libraries as much as possible*.\n", "\n", "#### Vectorization and SIMD\n", "The concept of Single Instruction Multiple Data (SIMD) became popular in the 70s, with the first vector supercomputers. \\\n", "*Writing a computer program so that the compiler can generate effective SIMD vector instructions is called vectorization* (Hager & Wellein, Introduction to HPC for Scientist and Engineers, CRC Press, 2011).\n", "\n", "SIMD allows the concurrent execution of arithmetic operations on a wide register that can hold, e.g., two DP or four SP floating-point words (Hager & Wellein, 2011). For instance, a single instruction can initiate four additions of single-precision floating-point values, each at once.\n", "\n", "For example, Intel's AVX instructions are based on up to 512-bit wide SIMD, that is, eight floating point numbers can be processed simultaneously. \n", "\n", "> Modern supercomputers are *not* SIMD-based (instead, thery are MIMD), but most modern CPUs includes strong SIMD components. General purpose GPUs are essentially SIMD-based.\n", "\n", "> **Processing instructions is actually expensive compared to a floating point operation**. *The use of SIMD is mostly motivated by power considerations. Decoding instructions is actually more power consuming than executing them*, **so SIMD parallelism is a way to save power** (Victor Eijkhout, Introduction to High Performance Scientific Computing, 2016).\n" ] }, { "cell_type": "markdown", "id": "695aef0a", "metadata": {}, "source": [ "----------------------------------------" ] }, { "cell_type": "markdown", "id": "d250a8ee", "metadata": {}, "source": [ "## Profiling and timing code\n", "\n", "In the process of developing code, there are typically several possible implementations to get the job done.\n", "Typically the right choice is a trade-off optimization between development time/effort, computational efficiency and code readibility. As [Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth) famously put it:\n", "\n", "> *Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered: **premature optimization is the root of all evil***.\n", "\n", "An effective approach is to \n", " 1. first design the code\n", " 2. then implement the code according to the design\n", " 3. profile/benchmark the code to understand\n", " * where the program spends most of his time\n", " * which parts can be optimised\n", " \n", ">*In software engineering, it is very typical that a program spents most of the time only in a small part of the program as exemplified by the common 90/10 rule: **90% of time is spent in 10% of the source code***. \n", "\n", "The two main ways to analyze performance are via applications own timers, which measure the time spent in specific region of a program, or by utilizing special performance analysis software. \n", "\n", "> Profiling is concerned with where the time is spent in the code. Often memory is key, not just compute time.\n", "\n", "IPython provides very useful *profiling* and *timing* tools\n", "\n", "* `%time`, `%timeit` : timing single staments (see the lecture of last week)\n", "* `%prun`: run the code with the profile\n", "* `%lprun` : line-by-line profiler\n", "* `%memit` : memory fo a single statement\n", "* `%mprun` : line-by-line memory profiler\n", "\n", "> For timing, you can also insert multiple time checks inside the code with the module ``time``:" ] }, { "cell_type": "markdown", "id": "16da2ad2", "metadata": {}, "source": [ "Let's practise upon the exercise of last week about correlations" ] }, { "cell_type": "code", "execution_count": null, "id": "fb88156a", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "def corr_for_loop(N,k_list=np.arange(100)):\n", " '''\n", " Calculates the correlation of random numbers\n", " with two nested loops\n", " ''' \n", " np.random.seed(42)\n", " a = np.random.rand(N)\n", " s = 0.0\n", " c_list = []\n", " for k in k_list:\n", " for i in range(len(a)-k):\n", " s = s + a[i]*a[i+k]\n", " s = s/float(len(a)-k)\n", " c_list.append(s)\n", " return k_list,np.array(c_list)\n", "\n", "def corr(N,k_list=np.arange(100)):\n", " '''\n", " Calculates the correlation of random numbers\n", " with some vectorization\n", " ''' \n", " np.random.seed(42)\n", " a = np.random.rand(N)\n", " def corr_k(k):\n", " if k==0:\n", " res = np.average(a**2)\n", " else:\n", " res = np.average(a[:-k]*a[k:]) \n", " return res\n", " \n", " return k_list,np.array([corr_k(k) for k in k_list])" ] }, { "cell_type": "code", "execution_count": null, "id": "25f623a9", "metadata": {}, "outputs": [], "source": [ "%timeit corr_for_loop(1000)\n", "%timeit corr(1000)" ] }, { "cell_type": "code", "execution_count": null, "id": "ef6f9b75", "metadata": {}, "outputs": [], "source": [ "#We need to load the profiler (to be installed with pip3 install line_profiler memory_profiler)\n", "%load_ext line_profiler\n", "%load_ext memory_profiler" ] }, { "cell_type": "code", "execution_count": null, "id": "169f99f3", "metadata": {}, "outputs": [], "source": [ "#Memory usage\n", "print('loop-based')\n", "%memit corr_for_loop(1000000)" ] }, { "cell_type": "code", "execution_count": null, "id": "227a9c33", "metadata": {}, "outputs": [], "source": [ "print('vectorized')\n", "%memit corr(1000000)" ] }, { "cell_type": "code", "execution_count": null, "id": "ad39156e", "metadata": {}, "outputs": [], "source": [ "%memit??" ] }, { "cell_type": "code", "execution_count": null, "id": "b9eb2b84", "metadata": {}, "outputs": [], "source": [ "#Line-by-line profiler\n", "%lprun -f corr_for_loop corr_for_loop(1000)" ] }, { "cell_type": "markdown", "id": "ea80d5ef", "metadata": {}, "source": [ "----------------------------------------" ] }, { "cell_type": "markdown", "id": "9b494a8e", "metadata": {}, "source": [ "## Exercise - Random numbers with non-uniform distributions\n", "\n", "While for simple and common distribution we can use the intrinsic random generators of NumPy, there are non-uniform distributions who need to be constructed by hand. In this exercise, we will try to sample the following distribution between -1 and +1:\n", "\n", "$$\\frac{1}{\\pi\\sqrt{(1-x^2)}}.$$\n", "\n", "We will use the inverse transformation method and generate a random number U with uniform distribution between 0 and 1, then we trasform it according to:\n", "\n", "$$x = \\cos{(\\pi U)}$$\n" ] }, { "cell_type": "code", "execution_count": null, "id": "a6e91457", "metadata": {}, "outputs": [], "source": [ "def nunf_dist(N):\n", " '''\n", " Generate a list of N random variables\n", " sampling the distribution \n", " 1.0*(np.sqrt(1.0-x**2)*np.pi\n", " '''\n", " import numpy as np\n", " return list(map(lambda x: np.cos(np.pi*x),np.random.rand(N)))" ] }, { "cell_type": "code", "execution_count": null, "id": "df575bd0", "metadata": {}, "outputs": [], "source": [ "# Let's sample!\n", "import time\n", "t0 = time.process_time()\n", "res = nunf_dist(int(1e5))\n", "t1 = time.process_time()\n", "print(\"Time spent\", t1 - t0)" ] }, { "cell_type": "code", "execution_count": null, "id": "1aa3e26c", "metadata": {}, "outputs": [], "source": [ "# Now we plot the results and compare it with the target distribution\n", "import matplotlib.pyplot as plt\n", "n_bins = 100\n", "plt.hist(res,weights = [n_bins/(2*len(res))]*len(res),bins=n_bins,edgecolor='black',label='random numbers')\n", "x= np.linspace(-0.99,0.99,1000)\n", "y = list(map(lambda x: 1.0/(np.sqrt(1.0-x**2)*np.pi),x))\n", "plt.xlabel('x')\n", "plt.ylabel('distribution')\n", "plt.plot(x,y,'-',lw=3,label=r'$\\frac{1}{\\pi\\sqrt{(1-x^2)}}$')\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "7f34f1bd", "metadata": {}, "outputs": [], "source": [ "#Save data to file\n", "np.savetxt('./rand_numbers',res)" ] }, { "cell_type": "code", "execution_count": null, "id": "c99cebaf", "metadata": {}, "outputs": [], "source": [ "#Load data from file\n", "data = np.loadtxt('./rand_numbers')" ] }, { "cell_type": "code", "execution_count": null, "id": "5ff6cfad", "metadata": {}, "outputs": [], "source": [ "data" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" } }, "nbformat": 4, "nbformat_minor": 5 }