Shantanu Jain

Shantanu Jain

Graduate Student in Computer Science

Columbia University in the city of New York

Hey, I’m Shantanu!

I am an inquisitive developer with an experience of 2.5+ years in the field of distributed systems, cloud infrastructure and backend applications. I graduated from Columbia University with a Master's degree in computer science. This is my portfolio page, wherein you can find all my projects and professional affiliations.

I am motivated towards building efficient software solutions with a customer-focused approach. My experiences at Cohesity Inc. and Siemens PLM, have provided me with the tools to understand the functional and non-functional requirements of a project and translating them into a scalable design. I have received recommendations from the senior management at both the companies, for my collaborative attitude and the ability to take ownership of the projects assigned.

Additionally, I have excelled consistently in courses that have been pursued during my graduate program with a GPA of 4.15. This is a testament to my hardwork and dedication towards learning something new.

I am currently looking for Software Engineering roles in the domain of distributed systems and backend infrastructure since that would allow me to leverage my experience in system design and contribute towards the growth of your organization.

Always open to talk about anything related to machine learning, cricket, fictional books and anything that requires out-of-the-box thinking!

Interests

  • Distributed Systems
  • Cloud infrastructure
  • Big Data
  • Deep Learning

Education

  • MS in Computer Science, May 2022

    GPA: 4.15 / 4.00

    Columbia University, New York, NY

  • B.E in Computer Engineering, July 2019

    GPA: 9.3 / 10

    University of Pune, Pune, India

Experience

 
 
 
 
 

Software Engineer (MTS 2)

Cohesity - MultiCloud Data Platform

Jul 2022 – Jul 2023 San Jose, CA
  • Collaborated as a member of the data archival team, overseeing archival and restore to and from the cold tier targets, utilizing efficient compression, encryption and deduplication strategies.
  • Owned a feature to validate API permissions for external archival targets registered on the Cohesity platform, such as AWS, GCP, Azure, NAS, and QStar targets, with a customer focused approach. This was taken to fruition and released.
  • Contributed to the development of a transactional system in C++ that synchronized local and cloud data, implementing fallback mechanisms and utilizing intents for consistency. All the requests were distributed across nodes, using paxos and 2PC for agreement.
 
 
 
 
 

Software Engineering Intern

Cohesity - MultiCloud Data Platform

Jun 2021 – Dec 2021 New York, NY
  • Enhanced Cohesity’s SaaS platform (Helios) by enabling expansion and contraction of Cohesity Clusters deployed on multiple cloud platforms including AWS, GCP and Azure.
  • Ensured multi-tenancy and scalability by employing Kafka queue and service worker threads.
  • Responsible for end-to-end design of the feature, from SRS to unit and integration testing.
 
 
 
 
 

Graduate Teaching Assistant

Columbia University

Jun 2021 – Aug 2021 New York, NY

  • Teaching assistant in the course title "Deep Learning". Gave tutorials on Tensorflow and Probalistic Programming Libraries.
  • Contributed towards Learning to learn Math competition, by using GPT-2 with graph neural networks, to create a machine learning model to score A grade on the subject.
 
 
 
 
 

Software Engineer

Siemens PLM

Jul 2019 – Dec 2020 Pune, India

  • Deployed and maintained the entire backend infrastructure on AWS as part of DevOps in the product research team.
  • Developed a serverless architecture for automated maintenance of the entire MongoDB infrastructure.
  • Led and contributed towards migration of the logging platform from CloudWatch to Fluentd.
  • Migrated the infrastructure to various AWS regions using Terraform with Terragrunt.
  • Created a deployment pipeline using Gitlab CI/CD to allow automatic detection of security vulnerabilities in the codebase (SAST, DAST, container level and dependency level).
 
 
 
 
 

Machine Learning Intern

7Targets

Apr 2019 – Jun 2019 Pune, India
  • Worked on an AI-based assistant for automating the process of conversing and grouping hot leads through continuous follow-ups.
  • Extracted and segmented information from the business cards of potential clients (leads) through image processing using OpenCV.
  • Saved approximately 60% of the time for the sales team.
 
 
 
 
 

Software Intern

Anomaly Solutions

Feb 2018 – Apr 2018 Pune, India
  • Developed an application for identifying diseased plant species based on the images of their leaves, flowers and stem.
  • Employed multi-organ classification as compared to a single organ for better accuracy.
  • Used Convolutional Neural Network for an accuracy of 81%.

Projects

*

Flagged Post Analysis: Stackoverflow

Analyzing key metrics associated with stackoverflow posts for classification of low quality posts. Identified 18 different textual, user based and code based features on a dataset of 96GB. Used LSTM encoder-decoder for labelling (semi-supervised) along with logistic regression for classification with an accuracy of 73%. Deployed the entire architecture on Spark (GCP dataproc) along with loading the dataset on Bigquery.

CodeNote: Snippet storage for developers

Developed a web application for easy access, storage and modification of code snippets. Features include searching (tag, comments, description), sharing of code snippets and linting for Python, C++ and Java. Deployed the app on AWS EC2, with continuous integration and deployment via Github Actions. Employed python-flask with S3, Dynamo DB and ElasticSearch for storage along with Cognito for 2FA.

Cancerous Cell Prediction

Identified cancerous regions within gigapixel pathology images to assist pathologists in identifying tumor cells. This was as part of Camelyon-16 Challenge. Used patching for generating a dataset from 21 gigapixel images, along with data augmentation at multiple zoom levels. Applied fine-tuned transfer learning with 2-tower InceptionV3 to achieve an accuracy of 96%.

TeleEasy Patient Portal

Created a distributed subscription-based teleconsultation platform deployed entirely on Amazon Web Services. Utilized the theory behind buffet economics to subsidize consultation costs for the patients. Employed best practices for security (2FA) and scalability (SQS, ElasticSearch and DynamoDB) for ensuring seamless digital experience.

E-Voting using Blockchain

Designed a scalable and secure web application for online voting based on blockchain and smart contracts. Created a custom blockchain, with proof-of-work consensus considering each state to be a miner as part of a P2P network. Blockchain synchronization across the miners was ensured by using Merkle Trees. Users could enquire about their vote, by using their private key so as to ensure security and transparency.

Voice controlled photo album

A scalable and secure web application for easy management and quick search of photos using text and voice inputs. The entire application was built on top of AWS, with the frontend hosted on S3. Quick search capabilities were provided by using elasticsearch, with authentication and authorization configured through AWS Cognito. The entire workflow was streamlined by using AWS Codepipeline for continuous integration and deployment.

Information Extraction from Unstructured Web Database

Created a system to fetch structured tuples from an unstructured corpus of documents on the internet. Utilised Iterative Set Expansion algorithm, which begins by providing a seed tuple to begin the process. We used spacy for named-entity tagging, and SpanBERT for relation identification from tokenized sentences.

Plant Species Identification using fusion techniques

Used transfer learning with Alexnet for identification of plant species on the basis of their leaves, stem and flower images. Improved accuracy by utilising various fusion techniques to combine the representations (obtained from images of leaves, stems and flowers individually), using sum, xor and multiplicative strategies.

Querius: Q/A App

Designed an android application for facilitating transparency between Professors and students in the university. The application used retrofit 2.0 for http(s) type safe conversion to java interface. Additionally, it used PHP, Mysql in the backend, with Java for the frontend. The application allowed students to subscribe to topics, post questions, upvote and share important announcements with peers.

Dining Concierge Chatbot

Designed an intelligent chatbot for providing dining recommendations based on the users preferences such as location, cuisine, number of guests, etc. The chatbot was designed using AWS Lex, with the application hosted on S3. The restaurant data was parsed from Yelp.

Contact