NumPy is the library that gives Python its ability to work with data at speed. Originally, launched in 1995 as ‘Numeric,’ NumPy is the foundation on whichmany important Python data science libraries are built, including Pandas, SciPy and scikit-learn.
Numpy cheatsheet Python notebook using data from no data sources 1,286 views 3mo ago. Python for Data Science Cheat Sheets. Python is one of the most widely used programming languages in the data science field.Python has many packages and libraries that are specifically tailored for certain functions, including pandas, NumPy, scikit-learn, Matplotlib, and SciPy.The most appealing quality of Python is that anyone who wants to learn it, even beginners, can do so quickly and easily.
In this cheat sheet, we use the following shorthand:
arr | A NumPy Array object |
Driver tp link for mac. You’ll also need to import numpy to get started:
np.loadtxt('file.txt') | From a text file |
np.genfromtxt('file.csv',delimiter=',') | From a CSV file |
np.savetxt('file.txt',arr,delimiter=' ') | Writes to a text file |
np.savetxt('file.csv',arr,delimiter=',') | Writes to a CSV file |
np.empty((1, 2))
| create an empty 1
x2
array. The value at each position is uninitialized (random value depending on the memory location).np.array([1,2,3])
| One dimensional array. Keyword argument dtype
converts elements into specified type.np.array([(1,2,3),(4,5,6)])
| Two dimensional arraynp.zeros(3)
| 1D array of length 3
all values 0
np.ones((3,4))
| 3
x4
array with all values 1
np.eye(5)
| 5
x5
array of 0
with 1
on diagonal (Identity matrix)np.linspace(0,100,6)
| Array of 6
evenly divided values from 0
to 100
np.arange(0,10,3)
| Array of values from 0
to less than 10
with step 3
(eg [0,3,6,9]
)np.full((2,3),8)
| 2
x3
array with all values 8
np.random.rand(4,5)
| 4
x5
array of random floats between 0
-1
np.random.rand(6,7)*100
| 6
x7
array of random floats between 0
-100
np.random.randint(5,size=(2,3))
| 2
x3
array with random ints between 0
-4
arr.size | Returns number of elements in arr |
arr.shape | Returns dimensions of arr (rows,columns) |
arr.dtype | Returns type of elements in arr |
arr.astype(dtype) | Convert arr elements to type dtype |
arr.tolist() | Convert arr to a Python list |
np.info(np.eye) | View documentation for np.eye |
np.copy(arr) | Copies arr to new memory |
arr.view(dtype) | Creates view of arr elements with type dtype |
arr.sort() | Sorts arr |
arr.sort(axis=0) | Sorts specific axis of arr |
two_d_arr.flatten() | Flattens 2D array two_d_arr to 1D |
arr.T | Transposes arr (rows become columns and vice versa) |
arr.reshape(3,4) | Reshapes arr to 3 rows, 4 columns without changing data |
arr.resize((5,6)) | Changes arr shape to 5 x6 and fills new values with 0 |
np.append(arr,values) | Appends values to end of arr |
np.insert(arr,2,values) | Inserts values into arr before index 2 |
np.delete(arr,3,axis=0) | Deletes row on index 3 of arr |
np.delete(arr,4,axis=1) | Deletes column on index 4 of arr |
np.vstack((arr1, arr2)) | Vertically stack multiple arrays. Think of it like the second arrays’s items being added as new rows to the first array. |
np.hstack((arr1, arr2)) | horizontally stack multiple arrays. |
np.concatenate((arr1,arr2),axis=0) | Adds arr2 as rows to the end of arr1 . It’s a general-purpose vstack . |
np.concatenate((arr1,arr2),axis=1) | Adds arr2 as columns to end of arr1 . It’s a general-purpose hstack . |
np.split(arr,3) | Splits arr into 3 sub-arrays |
np.hsplit(arr,5) | Splits arr horizontally on the 5 th index |
arr[5] | Returns the element at index 5 |
arr[2,5] | Returns the 2D array element on index [2][5] |
arr[1]=4 | Assigns array element on index 1 the value 4 |
arr[1,3]=10 | Assigns array element on index [1][3] the value 10 |
arr[0:3] | Returns the elements at indices 0,1,2 (On a 2D array: returns rows 0,1,2 ) |
arr[0:3,4] | Returns the elements on rows 0,1,2 at column 4 |
arr[:2] | Returns the elements at indices 0,1 (On a 2D array: returns rows 0,1 ) |
arr[:,1] | Returns the elements at index 1 on all rows |
arr<5 | Returns an array with boolean values |
(arr1<3) & (arr2>5) | Returns an array with boolean values |
~arr | Inverts a boolean array |
arr[arr<5] | Returns array elements smaller than 5 |
NumPy makes it possible to test to see if rows match certain values usingmathematical comparison operations like <
, >
, >=
, <=
, and . Forexample, if we want to see which wines have a quality rating higher than 5
,we can do this:
We get a Boolean array that tells us which of the wines have a quality ratinggreater than 5
. We can do something similar with the other operators. Forinstance, we can see if any wines have a quality rating equal to 10
:
One of the powerful things we can do with a Boolean array and a NumPy array isselect only certain rows or columns in the NumPy array. For example, the belowcode will only select rows in wines
where the quality is over 7
:
We select only the rows where high_quality
contains a True
value, and allof the columns. This subsetting makes it simple to filter arrays for certaincriteria. For example, we can look for wines with a lot of alcohol and highquality. In order to specify multiple conditions, we have to place eachcondition in parentheses, and separate conditions with an ampersand (&
):
We can combine subsetting and assignment to overwrite certain values in anarray:
numpy.transpose(arr) | Transpose the array. |
numpy.ravel(arr) | Turn an array into a one-dimensional representation. |
numpy.reshape(arr) | Reshape an array to a certain shape we specify. |
If you do any of the basic mathematical operations (/, *, -, +, ^
) with an array and a value, it will apply the operation to each of the elements in the array.
np.add(arr,1) or arr + 1 | Add 1 to each array element |
np.subtract(arr,2) or arr - 2 | Subtract 2 from each array element |
np.multiply(arr,3) or arr * 3 | Multiply each array element by 3 |
np.divide(arr,4) or arr / 4 | Divide each array element by 4 (returns np.nan for division by zero) |
np.power(arr,5) or arr ^ 5 | Raise each array element to the 5 th power |
Note that the above operation won’t change the wines array – it will return a new 1-dimensional array where 10 has been added to each element in the quality column of wines.
If we instead did +=
, we’d modify the array in place.
All of the common operations (/, *, -, +, ^
) will work between arrays.
np.add(arr1,arr2) | Elementwise add arr2 to arr1 |
np.subtract(arr1,arr2) | Elementwise subtract arr2 from arr1 |
np.multiply(arr1,arr2) | Elementwise multiply arr1 by arr2 |
np.divide(arr1,arr2) | Elementwise divide arr1 by arr2 |
np.power(arr1,arr2) | Elementwise raise arr1 raised to the power of arr2 |
np.array_equal(arr1,arr2) | Returns True if the arrays have the same elements and shape |
np.sqrt(arr) | Square root of each element in the array |
np.sin(arr) | Sine of each element in the array |
np.log(arr) | Natural log of each element in the array |
np.abs(arr) | Absolute value of each element in the array |
np.ceil(arr) | Rounds up to the nearest int |
np.floor(arr) | Rounds down to the nearest int |
np.round(arr) | Rounds to the nearest int |
np.mean(arr,axis=0) | Returns mean along specific axis |
arr.sum() | Returns sum of arr |
arr.min() | Returns minimum value of arr |
arr.max(axis=0) | Returns maximum value of specific axis |
np.var(arr) | Returns the variance of array |
np.std(arr,axis=1) | Returns the standard deviation of specific axis |
arr.corrcoef() | Returns correlation coefficient of array |
The original post can be found at dataquest.io.