A list of awesome projects completed by me while I am on a journey to explore Data Science, AI, ML.
I am in the process of showcasing / opensourcing more of my projects here very soon. Stay Tuned !
The major task was to detect the vehicles ( In this case CARS ) first & then calculating the distance between them. Vehicle Detection was done by making a Car & Not Car classifier along with a creation of heatmap. Vehicle Distance Detection was done as a part of my exploration with YAD2K ( It is a 90% Keras/10% Tensorflow implementation of YOLO_v2 ).
Skills Used - Keras, Tensorflow, Numpy, h5py, Pillow, Python
The major task was to recommend the ingredients and recipes just by looking at a food image. It was solved into two parts: One neural network was identifying the ingredients that it sees in the dish, while the other was devising a recipe from the list trained on the Food 101 Dataset. It was done as a part of my exploration for the research by Facebook AI team.
Skills Used - PyTorch, Numpy, Scipy, Matplotlib, Torchvision, nltk, Pillow, Tensorflow, Python
The main task was to identify the duplicates questions asked on Quora. I focused on finding the number of unique questions, occurrences of each question, along with Feature Extraction, EDA and Text Preprocessing. I also explored Advanced Feature Extraction (NLP and Fuzzy Features) , Logistic Regression & Linear SVM with hyperparameter tuning. I also generated a WordCloud Generation.
Skills Used - Nltk, distance, BeautifulSoup, fuzzywuzzy, Numpy, Pandas, Seaborn, Matplotlib, Plotly, re, Python
Payon is a banking web-app to assist people in seamless Digital Transactions where clients can create new Bank account and get a unique account number on sign-in . They can store and edit their account details too & can also transfer (fictitious) money from one bank account to another. It was developed as a curriculum project for Database Management System .
Skills Used - Python,Django,Flask,SQL,HTML,CSS
The main task was to extract unique users for each month and calculate total number of bookings made by each, total amount spent in each month, total room nights stayed for each user for each month & then merging these summarized datasets for a collective Data Exploration.
Skills Used - Numpy, Pandas, Python
The main task was to devise the best algorithm to predict user ratings for films. I focused Minimizing RMSE, providing Data interpretability. I created a sparse matrix from data frame and found Global average of all movie ratings etc. Then I calculated User Similarity Matrix with dimensionality reduction. The most similar movies was found using similarity matrix and Matrix Factorization Techniques were used.
Skills Used - Pandas, Matplotlib, Pyplot, Sklearn, Datetime, xgboost, Seaborn, Os, Scipy, Random, Python
The main task was to solve a Mathematical equation from an image. I started with a Manual collection of the Mathematical operators & then storing them with the MNIST dataset in HDF5 format. OpenCV techniques were used to extract equations, digits, operators. Then Caffe, Tensorflow were used to train, validate, recognise the Digits and Operators. Finally an Abstract Syntax Tree model was formed to generate the result of the equation.
Skills Used - Tensoflow, Caffe, HDF5, OpenCV, Numpy, Pandas, Pillow, Sklearn, Python
The main task was to predict the probability of each data-point belonging to each of the 9 Malware classes given in the dataset. I tried using the following matrices - Multi class log-loss and Confusion matrix and performed EDA. Tried Feature Extraction; performed Multivariate, Univariate Analysis. I also tried K Nearest Neighbour Classification, Logistic Regression, Random Forest classifier and Xgboost classification with best hyper parameters using RandomSearch.
Skills Used - Tqdm, Warnings, shutil, os, Pandas, Matplotlib, Seaborn, Numpy, pickle, sklearn, Random, xgboost
The main task was to implement Neural network for semantic segmentation. I started with dataset processing, model definition and finally started Model training. It was followed by a convolutional Encoder-Decoder architecture. The Encoder in my networks was similar to vgg-16. Decoder layers were then inverse to the layers as used in the encoder. I also tried a bit of Data Augmentation.
Skills Used - Python, Tensorflow, OpenCV
The main task was to build a model that predicts the human activities such as Walking, Sitting, Standing or Laying on the basis of data collected from sensors (Accelerometer and Gyroscope) in a smartphone. ‘3-axial linear acceleration' from accelerometer & '3-axial angular velocity' from Gyroscope were used to capture sequences. Logistic Regression, Decision Tree, Random Forest etc. were used to get the accuracy etc.
Skills Used - Numpy, Pandas, Datetime, Seaborn, Sklearn, Matplotlib, Python
The main task was to predict the Heart Disease using the 14 attributes Cleveland Database. I undertook EDA, Data Visualization, Disease vs Age Frequency Correlation. I also tried to generate Decision Tree, a Learning curve for Training score & cross validation score along with Confusion Matrix, Precision score, Recall, F Score, False negative Score. I tried to compare the performance of Random Forest, Naive Bayes, KNNs.
Skills Used - Numpy, Pandas, Matplotlib, Seaborn, sklearn, Python
The main task was to focus on the factors that lead to employee attrition and explore how factors like ‘Distance from home’ or ‘Average monthly income’ effects attrition. I focused on EDA, Feature Selection, SMOTE techniques and evaluated performance using Precision etc. Predictions were made using ANNs and One Hot Encoding was also performed.
Skills Used - Keras, Numpy, Pandas, Matplotlib, Searborn, Sklearn, Python
The main task was to classify songs based on genre (Hip-Hop and Rock. I trained a classifier to distinguish between the two genres based only on track information. I used pandas, seaborn for aggregating information, and creating plots. I used scikit-learn to predict the correct song classification based on features such as energy, acousticness, tempo, etc. I also implemented PCA, logistic regression and decision trees along with standard resampling.
Skills Used - Pandas, Matplotlib, Pyplot, Sklearn, Seaborn, Python
The main task was to detect the Brand Logo. I used Flickrlogos-32 dataset containing logos of 32 brands like Adidas, Apple etc. I used the torch and torchvision libraries. I created a LeNet-5-liked network for the Brand Logo classification. Some of the major steps undertook were Edge Detection, Morphological Processing and Sampling method. YOLOV2 also serves well in this task.
Skills Used - Python, PyTorch, Torchvision
The main task was to predict Alzheimer’s disease so that the patients can begin early treatment. I tried Support Vector Machines, Logistic Regression, Naïve Bayes approach. And the hyperparameters were chosen using 5-fold cross validation with feature selection performed by Principal component analysis etc. Ensembling of model was also performed.
Skills Used - Pandas, Matplotlib, Numpy, Seaborn, Scipy, sklearn, Python
The main task was to predict the no. of hours for an employee in a month where he/she would be absent with learning focus on Missing Value Analysis, Anova Test, KNN Imputation, Feature selection, Support Vector Machine Classification, Gradient Boosting Algorithm, Decision Tree, Random Forest, Hyper Parameter Tuning.
Skills Used - Pandas, Numpy, Matplotlib, Seaborn, Scipy, Sklearn, Python
Detecting the Apparels worn by an individual in the image and then classifying the apparels like Eyewear, Bag, Shoe etc.
Skills Used - Python, OpenCV
In my previous internships, I tried working on real world projects which include Real Time Prediction, Model Deployment on Cloud etc.
Skills Used - Data Science, Machine Learning, Deep Learning with Python, Cloud Computing