{ "cells": [ { "cell_type": "markdown", "id": "a398cc49", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Week IV - Random numbers" ] }, { "cell_type": "markdown", "id": "07c84767", "metadata": {}, "source": [ "## NumPy: Numerical Python\n", "\n", "NumPy is one of the most important Python packages for numerical computing and it sets most of the standards for array object and data exchange.\n", "Among the several features, it is worth to mention:\n", "* Effiecient multi-dimensional arrays with fast arithmetic and broadcasting\n", "* Linear algebra, random numbers and other mathematical capabilities\n", "* Mathematical functions which allow to operate on entire arrays (fast operations, avoiding for loops)\n", "* Dedicated I/O tools for arrays\n", "* Easy-to-use C API for connecting with libraries written in C, C++ or Fortran.\n", "\n", "For an introduction to NumPy, check out the lecture notes and related material of the courses *Strumenti Informatici per la Fisica* or *Abilità Informatiche e Telematiche*. \\\n", "For most doubts, first look at the official documentation https://numpy.org/doc/stable/\n", "\n" ] }, { "cell_type": "markdown", "id": "21682a95", "metadata": {}, "source": [ "## Pseudorandom number generation in Python and NumPy\n", "\n", "Python has a built-in module for random number generations, supporting several statistical distributions" ] }, { "cell_type": "code", "execution_count": null, "id": "96def874", "metadata": {}, "outputs": [], "source": [ "import random as rnd\n", "rnd?" ] }, { "cell_type": "code", "execution_count": null, "id": "dd51a595", "metadata": {}, "outputs": [], "source": [ "rnd.seed(10)\n", "rnd.random()\n" ] }, { "cell_type": "markdown", "id": "c9c164e1", "metadata": {}, "source": [ "NumPy has a dedicated module for random numbers, which is *much* faster and has extended functionalities " ] }, { "cell_type": "code", "execution_count": null, "id": "92a80e09", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "np.random.seed(10)\n", "np.random.rand()" ] }, { "cell_type": "markdown", "id": "f1f195fb", "metadata": {}, "source": [ "### Example: one million random numbers" ] }, { "cell_type": "code", "execution_count": null, "id": "b6e03733", "metadata": {}, "outputs": [], "source": [ "N = int(1e6)\n", "%timeit [rnd.random() for _ in range(N)]\n", "%timeit [np.random.rand() for _ in range(N)]\n", "%timeit np.random.rand(N)" ] }, { "cell_type": "markdown", "id": "20e1643d", "metadata": {}, "source": [ "Let's try now a more complex distribution" ] }, { "cell_type": "code", "execution_count": null, "id": "c026c44a", "metadata": {}, "outputs": [], "source": [ "%timeit [rnd.normalvariate() for _ in range(N)]\n", "%timeit [np.random.normal() for _ in range(N)]\n", "%timeit np.random.normal(size=N)" ] }, { "cell_type": "markdown", "id": "d38f1d2c", "metadata": {}, "source": [ "> NumPy is about 20 times faster than the standard Python module for random numbers!" ] }, { "cell_type": "code", "execution_count": null, "id": "397faa10", "metadata": {}, "outputs": [], "source": [ "a = np.random.normal(size=10)\n", "type(a)" ] }, { "cell_type": "markdown", "id": "fcaec20c", "metadata": {}, "source": [ "## Algorithms and generators for random numbers \n", "\n", "* Nowadays, it is generally recommended **not** to use the global Numpy random number generator (RNG), i.e. ``np.random.rand`` and ``np.random.seed``. \n", "\n", "* Instead, one should create an instance of a specific RNG and specify its seed.\n", "\n", "* Numpy has several algorithms already implemented, the default ``np.random.default_rng`` being *PCG64*: https://www.pcg-random.org/paper.html ; https://www.pcg-random.org .\n", "\n", "> The generation of random bits and the production of probability distributions are two separated objects, check out Numpy documentation at https://numpy.org/doc/stable/reference/random/index.html#module-numpy.random :\n", "> * **BitGenerators**: Objects that generate random numbers. \n", "> * **Generators**: Objects that transform sequences of random bits from a BitGenerator into sequences of numbers that follow a specific probability distribution (such as uniform, Normal or Binomial) within a specified interval." ] }, { "cell_type": "code", "execution_count": null, "id": "d4d2d22a", "metadata": {}, "outputs": [], "source": [ "#MT19937, Philox, PCG64, PCG64DXSM, SFC64\n", "print(np.random.MT19937)\n", "np.random.MT19937" ] }, { "cell_type": "code", "execution_count": null, "id": "b2441533", "metadata": {}, "outputs": [], "source": [ "np.random.default_rng??" ] }, { "cell_type": "code", "execution_count": null, "id": "1ecf21fc", "metadata": {}, "outputs": [], "source": [ "np.random.PCG64?" ] }, { "cell_type": "code", "execution_count": null, "id": "a1dc4cf7", "metadata": {}, "outputs": [], "source": [ "np.random.MT19937?" ] }, { "cell_type": "markdown", "id": "67dc65b6", "metadata": {}, "source": [ "## Exercise - Intrinsic random-number generators: uniformity and correlation\n", "\n", "### 1) Moments \n", "\n", "Let's calculate the moments and the amount of correlation of a computer-generated random distribution!" ] }, { "cell_type": "code", "execution_count": null, "id": "19a88e2d", "metadata": {}, "outputs": [], "source": [ "def moment_random_dist(x,k):\n", " '''\n", " Calculates the k-th moment of an array x\n", " '''\n", " return np.average(np.power(x,k))\n", " " ] }, { "cell_type": "code", "execution_count": null, "id": "4b5d7d85", "metadata": {}, "outputs": [], "source": [ "def moments(N,verbose=True,k_list=np.arange(1,8)):\n", " '''\n", " Calculates the moments of a pseudorandom distribution\n", " and compares it with the expected theoretical value\n", " for the corresponding statistical distribution.\n", " ''' \n", " rng = np.random.default_rng(seed=42424)\n", " m_list = []\n", " dev_list = []\n", " for k in k_list:\n", " #m_list.append(moment_random_dist(np.random.rand(N),k))\n", " m_list.append(moment_random_dist(rng.random(N),k))\n", " dev_list.append(np.abs(m_list[-1] - 1.0/(1.0+k)))\n", " if verbose:\n", " if k==1:\n", " prefix = 'st'\n", " elif k==3:\n", " prefix = 'rd'\n", " elif k==2:\n", " prefix = 'nd'\n", " else:\n", " prefix = 'th'\n", " print('{}{} moment is equal to :{}, to be compared with {}'.format(k,prefix,m_list[-1],1.0/(1.0+k)))\n", " return k_list,m_list,dev_list" ] }, { "cell_type": "code", "execution_count": null, "id": "ad04b28a", "metadata": {}, "outputs": [], "source": [ "k_list,m_list,dev_list = moments(1000000)" ] }, { "cell_type": "markdown", "id": "9292107a", "metadata": {}, "source": [ "Now we do this for different sizes `N` of the sample." ] }, { "cell_type": "code", "execution_count": null, "id": "ea9773a3", "metadata": {}, "outputs": [], "source": [ "N_list = range(10,100000,100)\n", "k_list = np.arange(1,8)\n", "dev_array = np.zeros((len(N_list),len(k_list)+1))\n", "print(dev_array.shape)\n", "dev_array[:,0] = N_list\n", "\n", "for index,N in enumerate(N_list):\n", " k_list,m_list,dev_list = moments(N,verbose=False)\n", " dev_array[index,0] = N \n", " dev_array[index,1:] = dev_list" ] }, { "cell_type": "code", "execution_count": null, "id": "c17e3cd8", "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "k = 3\n", "plt.loglog(dev_array[:,0],dev_array[:,k],'o-',label='{}-th moment'.format(k))\n", "plt.loglog(dev_array[:,0],list(map(lambda x: 1.0/np.sqrt(x),dev_array[:,0])))\n", "plt.xlabel('N')\n", "plt.ylabel('$\\Delta_N(k)$')\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "050c58d8", "metadata": {}, "source": [ "Finally, we calculate the deviation for different moments." ] }, { "cell_type": "code", "execution_count": null, "id": "17e34aa9", "metadata": {}, "outputs": [], "source": [ "for index,k in enumerate(k_list):\n", " plt.loglog(dev_array[:,0],dev_array[:,k],'o-',label='{}-th moment'.format(k))\n", "plt.legend()\n", "plt.show()" ] }, { "cell_type": "markdown", "id": "7c33e05c", "metadata": {}, "source": [ "### 2) Correlation\n", "\n", "Now that we have verified that our random numbers have the correct moments, we want to calculate correlation defined as\n", "\n", "$$\n", "C(k)=\\frac{1}{N}\\sum_{i=1}^{N} x_ix_{i+k}\n", "$$" ] }, { "cell_type": "code", "execution_count": null, "id": "1db4983c", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "def corr_for_loop(N,k_list=np.arange(100)):\n", " '''\n", " Calculates the correlation of random numbers\n", " with two nested loops\n", " ''' \n", " #np.random.seed(42)\n", " #a = np.random.rand(N)\n", " rng = np.random.default_rng(seed=42424)\n", " a = rng.random(N) \n", " s = 0.0\n", " c_list = []\n", " for k in k_list:\n", " for i in range(len(a)-k):\n", " s = s + a[i]*a[i+k]\n", " s = s/float(len(a)-k)\n", " c_list.append(s)\n", " return k_list,np.array(c_list)" ] }, { "cell_type": "code", "execution_count": null, "id": "2e53a82e", "metadata": {}, "outputs": [], "source": [ "def corr(N,k_list=np.arange(100)):\n", " '''\n", " Calculates the correlation of random numbers\n", " with some vectorization\n", " ''' \n", " #np.random.seed(42)\n", " #a = np.random.rand(N)\n", " rng = np.random.default_rng(seed=42424)\n", " a = rng.random(N) \n", " def corr_k(k):\n", " if k==0:\n", " res = np.average(a**2)\n", " else:\n", " res = np.average(a[:-k]*a[k:]) \n", " return res\n", " \n", " return k_list,np.array([corr_k(k) for k in k_list])" ] }, { "cell_type": "code", "execution_count": null, "id": "f63dbb5d", "metadata": {}, "outputs": [], "source": [ "corr(100000)[1][1:5]" ] }, { "cell_type": "code", "execution_count": null, "id": "9eee1506", "metadata": {}, "outputs": [], "source": [ "%timeit corr_for_loop(1000)\n", "%timeit corr(1000)" ] }, { "cell_type": "code", "execution_count": null, "id": "93caf9f9", "metadata": {}, "outputs": [], "source": [ "%timeit corr_for_loop(10000)\n", "%timeit corr(10000)" ] }, { "cell_type": "markdown", "id": "b740763f", "metadata": {}, "source": [ "The function `corr` is 50-220x faster than `corr_for_loops`, and the computing time scales differently with `N`. \\\n", "For loops in Python are very slow, but they are seldom unavoidable.\n", "\n", "> Next week, we will discuss the concept of **vectorization**." ] }, { "cell_type": "markdown", "id": "e82ee27a", "metadata": {}, "source": [ "------------------------------------------------" ] }, { "cell_type": "markdown", "id": "131c1fce", "metadata": {}, "source": [ "## Some useful IPython magic commands" ] }, { "cell_type": "markdown", "id": "aabef2b5", "metadata": {}, "source": [ "* `!cmd`: execute the command `cmd` in the shell \n", "* `%paste`: take text from the clipboard and executes it as a single block (preserving indentation)\n", "* `%cpaste`: similar to `%paste` but you get a special prompt, so you can paste code multiple times and look at it\n", "* `%hist`: print command history\n", "* `%run` : run a Python script (e.g. `%run myscript.py`) \n", "* `%time`: report the execution of a single statement\n", "* `%timeit`: calculates an ensemble average execution time (by running the statement multiple times)\n", "\n", "Many other magic commands exist, check out the full list of built-in commands at https://ipython.readthedocs.io/en/stable/interactive/magics.html" ] }, { "cell_type": "markdown", "id": "3585def8", "metadata": {}, "source": [ "## EXTRA: " ] }, { "cell_type": "code", "execution_count": null, "id": "f86d1a97", "metadata": {}, "outputs": [], "source": [ "k=1\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "N_list = np.arange(100,100000,100)\n", "dev_list = []\n", "for N in N_list:\n", " a = np.random.rand(N)\n", " res = np.average(a[:-k]*a[k:])\n", " dev_list.append(abs(res-0.25))\n", "plt.loglog(N_list,dev_list,'o-')\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "id": "a0f6d58e", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "fis_comp_2023", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.1" } }, "nbformat": 4, "nbformat_minor": 5 }