George Mihaila 4a54f16a00 Update README.md
2020-09-11 12:21:27 -05:00
2020-09-08 19:27:45 -05:00
2020-05-09 15:57:20 -05:00
2020-09-08 12:58:17 -05:00
2020-09-11 12:21:27 -05:00
2020-09-08 14:47:50 -05:00

Machine Learning Things

Generic badge License Generic badge Generic badge

Machine Learning Things is a lightweight python library that contains functions and code snippets that I use in my everyday research with Machine Learning, Deep Learning, NLP.

I created this repo because I was tired of always looking up same code from older projects and I wanted to gain some experience in building a Python library. By making this available to everyone it gives me easy access to code I use frequently and it can help others in their machine learning work. If you find any bugs or something doesn't make sense please feel free to open an issue.

That is not all! This library also contains Python code snippets and notebooks that speed up my Machine Learning workflow.

Table of contents

  • ML_things: Details on the ml_things libary how to install and use it.

  • Snippets: Curated list of Python snippets I frequently use.

  • Notebooks: Google Colab Notebooks from old project that I converted to tutorials.

  • Final Note


ML_things

Installation

This repo is tested with Python 3.6+.

It's always good practice to install ml_things in a virtual environment. If you guidance on using Python's virtual environments you can check out the user guide here.

You can install ml_things with pip from GitHub:

pip install git+https://github.com/gmihaila/ml_things

Functions

pad_array [source]

def pad_array(variable_length_array, fixed_length=None, axis=1)
Description: Pad variable length array to a fixed numpy array.
It can handle single arrays [1,2,3] or nested arrays 1,2],[3.
Parameters: :param
   variable_length_array: Single arrays [1,2,3] or nested arrays 1,2],[3.
:param
   fixed_length: max length of rows for numpy.
:param
   axis: directions along rows: 1 or columns: 0
Returns: :return:
   numpy_array:
     axis=1: fixed numpy array shape [len of array, fixed_length].
     axis=0: fixed numpy array shape [fixed_length, len of array].

Example:

>>> from ml_things import pad_array
>>> pad_array(variable_length_array=[[1,2],[3],[4,5,6]], fixed_length=5)
array([[1., 2., 0., 0., 0.],
       [3., 0., 0., 0., 0.],
       [4., 5., 6., 0., 0.]])

batch_array [source]

def batch_array(list_values, batch_size)
Description: Split a list into batches/chunks.
Last batch size is remaining of list values.
Parameters: :param
   list_values: can be any kind of list/array.
:param
   batch_size: int value of the batch length.
Returns: :return:
   List of batches from list_values.

plot_confusion_matrix [source]

plot_confusion_matrix(y_true, y_pred, classes='', normalize=False, title=None, cmap=plt.cm.Blues, image=None,
                          verbose=0, magnify=1.2, dpi=50)

download_from [source]

download_from(url, path)

Snippets

This is a very large variety of Python snippets without a certain theme. I put them in the most frequently used ones while keeping a logical order. I like to have them as simple and as efficient as possible.

Name Description
Read FIle One liner to read any file.
Write File One liner to write a string to a file.
Debug Start debugging after this line.
Pip Install GitHub Install library directly from GitHub using pip.
Parse Argument Parse arguments given when running a .py file.
Using Doctest How to run a simple unittesc using function documentaiton. Useful when need to do unittest inside notebook.
Unittesting Simple example of creating unittests.
Sort Keys Sorting dicitonary using key values.
Sort Values Sorting dicitonary using values.

Notebooks

This is where I keep notebooks of some previous projects which I turnned them into small tutorials. A lot of times I use them as basis for starting a new project.

All of the notebooks are in Google Colab. Never heard of Google Colab? 🙀 You have to check out the Overview of Colaboratory, Introduction to Colab and Python and what I think is a great medium article about it to configure Google Colab Like a Pro.

If you check the /ml_things/notebooks/ a lot of them are not listed here because they are not in a 'polished' form yet. These are the notebooks that are good enough to share with everyone:

Name Description Colab Link
Pretrain Transformers Simple notebook to pretrain transformers model on a specific dataset using transformers from Huggingface

Final Note

Thank you for checking out my repo. I am a perfectionist so I will do a lot of changes when it comes to small details.

Lern more about me? Check out my website gmihaila.github.io!

Languages
Jupyter Notebook 99.5%
Python 0.5%