Skip to content

Top 30 Python Repos on GitHub for ML and AI Startups

Top 30 Python Repos on GitHub for ML and AI Startups

Python Projects on Github

Indeed many of you would be curious to know which Python projects are the most popular at the beginning of 2023. GitHub is by far the best place to get these stats. Although not all projects can be found here, there is no worthy alternative.

  • Seven repositories that improve performance;
  • Three repositories with frameworks;
  • Five repositories that facilitate machine learning;
  • Four repositories that make real life easier;
  • Six repositories that collect and organize useful information;
  • Five repositories that teach something.

Now let’s see how we can get a ranked list from the GitHub API with a few lines of code.

API, search, and GitHub

The official API documentation can be found on this page. I won’t retell it; let’s get straight to the point. The best thing is that we do not need to register or request an API key to use this method. Of course, it has a speed limit of up to 10 requests per minute, but this is enough to test the code and connect a ranked list.

First of all, we use the Python requests module. You are probably familiar with him. We will also take Pandas for data transformation.

import requests
import pandas as pd

The URL is built from the API documentation. Since we are only looking for Python-based projects, we specify an argument language:pythonin the request. Then we add the sorting of the search results by the number of stars and arrange them in descending order.

url = ''

We then use requests to call the API. The GET method is suitable for this. You can then convert the results to a Python dictionary.

res = requests.get(url) 
res_dict = res.json()

All search results will be in an array with the key “items”. The default page size is 30, so we immediately see the top 30 repositories.

repos = res_dict['items']

By the way, at the time of submitting the request, there were 8,046,758 Python repositories.

Now let’s convert the array of elements to a Pandas data frame.

repo_df = pd.DataFrame(repos)

Now let’s remove any columns we don’t need and add a column called year_on_github, to find out when this project was created on GitHub.

repo_df = repo_df[['name', 'full_name', 'html_url', 'created_at', 'stargazers_count', 'watchers', 'forks', 'open_issues']]
repo_df['created_at'] = pd.to_datetime(repo_df['created_at'])
repo_df['created_year'] = repo_df['created_at'].dt.year
repo_df['years_on_github'] = 2022 - repo_df['created_at'].dt.year

Here is a complete list of the top 30 repositories.

These projects make everyday work more accessible and improve productivity, such as adding useful features.

1. thefxxk (6th place, 65,988 stars)

This tool can help us fix errors in the previous command when we use the console, for example, in a Linux or GitBash environment.

2. httpie (9th place, 53,255 stars)

HTTPie is a command line HTTP client. Its goal is to make CLI interaction with web services as human-friendly as possible. HTTPie is designed to test, debug, and interact with APIs and HTTP servers. The HTTP and HTTPS commands allow you to create and send arbitrary requests. They use simple and natural syntax and provide formatted and highlighted output.

3. you-get (12th, 42,791 stars)

This is a tiny command line utility for downloading multimedia content (video, audio, images) from the Internet if there is no other convenient way.

4. LocalStack (16th place, 38,008 stars)

This repository is a cloud service emulator that runs in a single container on your laptop or in your CI environment. With LocalStack, you can run applications on your local machine without connecting to a remote cloud provider! Whether you’re testing complex CDK applications, Terraform configurations, or just getting started with cloud provider services, LocalStack helps you speed up and simplify your test and development workflow.

5. Shadowsocks (20th, 33,099 stars)

Shadowsocks is a free and open-source encryption protocol project widely used in China to bypass internet censorship. The repository was removed due to a violation of the rules of the service by the time this article was written.

6. rich (23rd, 32,075 stars)

This repository makes it easy to add colors and styles to terminal output. It can also display beautiful tables, progress bars, code syntax highlighting, tracebacks, and more right out of the box.

7. certbot (30th place, 28,587 stars)

Certbot is part of EFF’s effort to encrypt the entire internet. Secure Internet communications are based on the HTTPS protocol, which requires a digital certificate that allows browsers to authenticate web servers. Web servers obtain certificates from trusted third parties called Certificate Authorities (CAs). Certbot is an easy-to-use client that extracts a certificate from Let’s Encrypt – a public certificate authority launched by EFF, Mozilla, and others – and deploys it to a web server.

Framework repositories

These repositories are well-known frameworks for either web development or other software development.

1. flask (7th place, 57,584 stars)

I don’t think there’s any need to rant about what it is. If you’re using Python for web development, you’ve probably used Flask at some point, or at least you know it.

2. scrapy (14 rating, 42,471 stars)

This is the framework you need to learn if you want to use Python for web scraping. It simplifies and automates the work of extracting information from web pages. They are used to crawl websites and extract structured data from their pages. It can be used for various purposes, from data mining to monitoring and automated testing.

3. fastapi (15 rating, 40,363 stars)

Another popular framework for backend development. Its goal is to use the least amount of code to write common web APIs. If your backend isn’t too complicated, use it.

Repositories that facilitate machine learning

1. models (5th place, 72,417 stars)

If you’ve heard of machine learning using Python, then you’ve probably heard of TensorFlow. This repository is also called “TensorFlow Modul Garden”. It organizes machine learning models and implements them using TensorFlow with examples. Models may be officially from TensorFlow, from some notable research projects, or the community. They can save time when you need to use any machine learning models in TensorFlow.

2. keras (8th place, 53,638 stars)

A trendy machine learning framework in Python. It provides many high-level APIs that allow the data scientist to train the deep learning model with the smallest amount of code.

3. Face_recognition (13 ratings, 42,762 stars)

As the name suggests, this project can recognize and manipulate faces using Python or the command line. It is claimed to be the world’s most accessible face recognition library. Just add an image with human faces, and it will identify and find it. Then you can even change it using some prebuilt functions.

4. Real-Time-Voice-Cloning (21st place, 32,607 stars)

I’d instead put this in the machine learning category as it’s a fantastic project, but it’s essential not to be abused. He can “learn” someone’s voice from a 5-second recording of the speech and then use the “learned” voice to say anything. Below is a demo video from the author.

5. DeepFaceLab (27th, 30,651 stars)

If you have seen “deep fake” videos, he most likely created them based on this project. It remembers a human face and “implants” it into another video to replace the original look with the one you want to add. There may be many ethical issues, but this project is truly unique.

Repositories that make real life easier

These repositories are written in Python but are used in real life. They can either save us time or let us do some exciting stuff.

1. core (11th place, 48,763 stars)

This repository is open-source home automation that puts local control and privacy first—powered by a massive community of experts and DIY enthusiasts. Ideal for running on Raspberry Pi or a local server. They are used in many successful home automation products such as Amazon Alexa and Google Cast.

2. openpilot (24th place, 31,998 stars)

This repository is an open-source driver assistance system. Provides Adaptive Cruise Control (ACC), Auto Lane Centering (ALC), Forward Collision Warning (FCW), and Lane Departure Warning (LDW) for a growing number of supported vehicle makes models, and years. However, you must buy and install their product on your car. It’s not exactly DIY, but it does make life easier.

3. XX-Net (26th, 31,002 stars)

It is a proxy tool to bypass China’s “Great Firewall”, allowing Chinese internet users to visit sites like YouTube and Facebook.

4. 12306 (28th place, 30,401 stars)

12306 is the hotline number for booking train tickets in China, it was also used as a domain name when they launched online booking about 15 years ago. This repository is an automatic ticket booking tool, because seats can be scarce during peak times, and it is quite difficult to book them. This is how developers solve problems :).

Repositories that collect and organize useful information

1. public-apis (1st place, 173,658 stars)

This repository contains hundreds of free APIs that can be used to develop software and web applications. Many of them are very interesting, such as Fun Facts, which randomly generates a fun fact every time we call it. There are also some very useful APIs, such as the Colormind API, that can be used to create great color codes that can potentially be used in data visualizations. In addition, many open government APIs can be used to get country statistics.

2. awesome-python (4th place, 112,609 stars)

This repository contains hundreds of other awesome Python projects from GitHub. Projects are categorized to make it easier to find the right one.

3. awesome-machine-learning (10th place, 52,487 stars)

As in the previous case, this repository contains machine learning projects.

4. funNLP (17th place, 35,922 stars)

This is a Chinese repository, although it will also be understood by an English-speaking audience. However, there is no English version in the documentation. It provides NLP dictionaries, (sensitive words, NLP toolkits and some study materials).

5. interview_internal_reference (19 ratings, 33,135 stars)

Another Chinese repository collects frequently asked interview questions from well-known companies such as Huawei. It also provides the basic knowledge that you need to know as a data developer, such as algorithms and database design principles.

6. Deep-Learning-Papers-Reading-Roadmap (24th, 31,521 stars)

If you want to learn machine learning from scratch, this repository can be a good place to start. He draws a “roadmap” for developing machine learning in history using scientific articles.

Repositories that teach something

These repositories are not for code. They can be thought of as an open-source “book”. The information they contain can be used to study something.

1. system-design-primer (2nd place, 157,775 stars)

This repository is a real treasure for those who want to start a career as a systems architect. It introduces many useful concepts. You can find almost everything you want to know, such as when to use SQL or NoSQL, how to design a distributed relational database, and what is a reverse proxy.

2. Python-100-Days (3rd place, 113,812 stars)

This is a repository in Chinese. It divides the knowledge of Python programming into 100 parts so that students can complete the entire course in 100 days if they study hard one part a day. Unfortunately, it has not been translated into English.

3. PayloadsAllTheThings (18 ratings, 33,407 stars)

This repository provides a list of payloads and web application security bypasses. If you want to be an experienced web developer who wants to avoid popular security holes and pitfalls, this material is necessary. Almost all concepts are implemented using Python as an example.

4. AiLearning (22nd, 32,517 stars)

A Chinese-language repository that teaches Python basics to the PyTorch ML framework.

5. d2l-zh (29th place, 29th, 618 stars)

The book is called Deep Learning Dive. Although the repository is Chinese, the English version (original version) can be easily found in English.


The rating may not quite correctly reflect the real value of a particular repository. These are dry statistics based on the number of bookmarks and ratings. And as you know, there are lies, big lies, and statistics. However, something interesting can be found in this list.

I hope you find something for yourself too.


Leave a Reply

Your email address will not be published. Required fields are marked *