2020-09-08 21:02:03 -05:00
2020-09-08 19:27:45 -05:00
2020-05-09 15:57:20 -05:00
2020-09-08 21:02:03 -05:00
2020-09-08 12:58:17 -05:00
2020-09-08 20:57:28 -05:00
2020-09-08 14:47:50 -05:00

Machine Learning Things

Generic badge License Generic badge Generic badge

Machine Learning Things is a lightweight python library that contains functions and code snippets that I use in my everyday research with Machine Learning, Deep Learning, NLP.

I created this repo because I was tired of always looking up same code from older projects and I wanted to gain some experience in building a Python library. By making this available to everyone it gives me easy access to code I use frequently and it can help others in their machine learning work. If you find any bugs or something doesn't make sense please feel free to open an issue.

That is not all! This library also contains Python code snippets and notebooks that speed up my Machine Learning workflow.

Table of contents

  • Ml_things: Details on the ml_things libary how to install and use it.

  • Snippets: Curated list of Python snippets I frequently use.

  • Notebooks: Google Colab Notebooks from old project that I converted to tutorials.

  • Final Note


Ml_things

Installation

This repo is tested with Python 3.6+.

It's always good practice to install ml_things in a virtual environment. If you guidance on using Python's virtual environments you can check out the user guide here.

You can install ml_things with pip from GitHub:

pip install git+https://github.com/gmihaila/ml_things

Functions

pad_array [source]

def pad_array(variable_length_array, fixed_length=None, axis=1)
Parameters: variable_length_array : array
    Single arrays [1,2,3] or nested arrays 1,2],[3.
fixed_length : int
    Max length of rows for numpy.
axis : int
    Directions along rows: 1 or columns: 0.
Returns: numpy_array :
    axis=1: fixed numpy array shape [len of array, fixed_length].
    axis=0: fixed numpy array shape [fixed_length, len of array].

Example:

>>> from ml_things import pad_array
>>> pad_array(variable_length_array=[[1,2],[3],[4,5,6]], fixed_length=5)
array([[1., 2., 0., 0., 0.],
       [3., 0., 0., 0., 0.],
       [4., 5., 6., 0., 0.]])

batch_array [source]

def batch_array(list_values, batch_size)

Snippets

This is a very large variety of Python snippets without a certain theme. I put them in the most frequently used ones while keeping a logical order. I like to have them as simple and as efficient as possible.

Name Description
Read FIle One liner to read any file.
Write File One liner to write a string to a file.
Debug Start debugging after this line.
Pip Install GitHub Install library directly from GitHub using pip.
Parse Argument Parse arguments given when running a .py file.
Using Doctest How to run a simple unittesc using function documentaiton. Useful when need to do unittest inside notebook.
Unittesting Simple example of creating unittests.
Sort Keys Sorting dicitonary using key values.
Sort Values Sorting dicitonary using values.

Notebooks

This is where I keep notebooks of some previous projects which I turnned them into small tutorials. A lot of times I use them as basis for starting a new project.

All of the notebooks are in Google Colab. Never herd of Google Colab? 🙀 You have to check out the Overview of Colaboratory, Introduction to Colab and Python and what I think is a great medium article about it to configure Google Colab Like a Pro.

If you check the /ml_things/notebooks/ a lot of them are not listed here because they are not in a 'polished' form yet. These are the notebooks that are good enough to share with everyone:

Name Description Colab Link
Pretrain Transformers Simple notebook to pretrain transformers model on a specific dataset using transformers from Huggingface

Final Note

Thank you for checking out my repo. I am a perfectionist so I will do a lot of changes when it comes to small details.

Lern more about me? Check out my website gmihaila.github.io!

Languages
Jupyter Notebook 99.5%
Python 0.5%