Shantanu Jain

Shantanu Jain

MTS 3 at VMware by Broadcom

Columbia University in the city of New York

Hey, I’m Shantanu!

Welcome! I’m a platform infrastructure engineer specializing in designing scalable, resilient software systems for both on-premises and cloud-native environments. Over the past 4.5 years, I’ve tackled challenging infrastructure problems by architecting robust, distributed solutions that drive reliability, efficiency, and operational confidence.

At VMware by Broadcom, I led the development of high-availability observability services and designed automated remediation workflows for telco cloud platforms. My work empowered organizations with unified insights into the health of thousands of endpoints, enhancing system reliability and directly improving operational efficiency.

Earlier, at Cohesity Inc., I worked on secure, efficient backend solutions for data archival services—supporting seamless hot-to-cold storage migration with compression, encryption, and de-duplication. This initiative optimized storage costs for clients and ensured robust data integrity, all while streamlining operations for internal teams.

My academic background from Columbia University (GPA: 4.15) in Machine Learning and Distributed Systems sharpened my expertise in advanced algorithms and systems design. Backed by hands-on experience in Java, C++, container orchestration (Docker, Kubernetes), test automation, I consistently deliver solutions that blend strong technical foundations with real-world impact.

I enjoy collaborating with diverse teams to transform complex requirements into effective, customer-centric solutions. If you’re interested in cloud infrastructure, intelligent automation, or distributed systems—or simply want to talk about creative problem solving, cricket, or fiction — I’m always up for a conversation!

Interests

  • Cloud infrastructure
  • Distributed Systems
  • Big Data
  • Deep Learning

Education

  • MS in Computer Science, May 2022

    GPA: 4.15 / 4.00

    Columbia University, New York, NY

  • B.E in Computer Engineering, July 2019

    GPA: 9.3 / 10

    University of Pune, Pune, India

Experience

 
 
 
 
 

Member of Technical Staff 3 (MTS 3)

VMware by Broadcom – Telco Cloud Platform

Jul 2023 – Present Palo Alto, CA
  • Developed and enhanced core cloud orchestration services, enabling robust lifecycle management and automation of virtualized network functions (VNFs) across large-scale Telco cloud environments.
  • Led the design of a fault-tolerant certificate observability and API handshake monitoring service in Java Spring for the Telco Cloud Platform, providing unified health visibility across 20,000+ globally distributed endpoints.
  • Architected and implemented distributed, multi-threaded remediation workflows for automated handling of expired/untrusted CA certificates, invalid credentials, and renewals, empowering users with seamless one-click resolution actions.
  • Built an end-to-end distributed observability framework (MELT: metrics, events, logs, traces) leveraging fluent-bit and otel-collector—enabling fine-grained custom metrics for scalability testing and enhanced audit trails for platform security.
 
 
 
 
 

Software Engineer (MTS 2)

Cohesity - MultiCloud Data Platform

Jul 2022 – Jul 2023 San Jose, CA
  • Collaborated as a member of the data archival team, overseeing archival and restore to and from the cold tier targets, utilizing efficient compression, encryption and deduplication strategies.
  • Owned a feature to validate API permissions for external archival targets registered on the Cohesity platform, such as AWS, GCP, Azure, NAS, and QStar targets, with a customer focused approach. This was taken to fruition and released.
  • Contributed to the development of a transactional system in C++ that synchronized local and cloud data, implementing fallback mechanisms and utilizing intents for consistency. All the requests were distributed across nodes, using paxos and 2PC for agreement.
 
 
 
 
 

Software Engineering Intern

Cohesity - MultiCloud Data Platform

Jun 2021 – Dec 2021 New York, NY
  • Enhanced Cohesity’s SaaS platform (Helios) by enabling expansion and contraction of Cohesity Clusters deployed on multiple cloud platforms including AWS, GCP and Azure.
  • Ensured multi-tenancy and scalability by employing Kafka queue and service worker threads.
  • Responsible for end-to-end design of the feature, from SRS to unit and integration testing.
 
 
 
 
 

Graduate Teaching Assistant

Columbia University

Jun 2021 – Aug 2021 New York, NY

  • Teaching assistant in the course title "Deep Learning". Gave tutorials on Tensorflow and Probalistic Programming Libraries.
  • Contributed towards Learning to learn Math competition, by using GPT-2 with graph neural networks, to create a machine learning model to score A grade on the subject.
 
 
 
 
 

Software Engineer

Siemens PLM

Jul 2019 – Dec 2020 Pune, India

  • Deployed and maintained the entire backend infrastructure on AWS as part of DevOps in the product research team.
  • Developed a serverless architecture for automated maintenance of the entire MongoDB infrastructure.
  • Led and contributed towards migration of the logging platform from CloudWatch to Fluentd.
  • Migrated the infrastructure to various AWS regions using Terraform with Terragrunt.
  • Created a deployment pipeline using Gitlab CI/CD to allow automatic detection of security vulnerabilities in the codebase (SAST, DAST, container level and dependency level).
 
 
 
 
 

Machine Learning Intern

7Targets

Apr 2019 – Jun 2019 Pune, India
  • Worked on an AI-based assistant for automating the process of conversing and grouping hot leads through continuous follow-ups.
  • Extracted and segmented information from the business cards of potential clients (leads) through image processing using OpenCV.
  • Saved approximately 60% of the time for the sales team.
 
 
 
 
 

Software Intern

Anomaly Solutions

Feb 2018 – Apr 2018 Pune, India
  • Developed an application for identifying diseased plant species based on the images of their leaves, flowers and stem.
  • Employed multi-organ classification as compared to a single organ for better accuracy.
  • Used Convolutional Neural Network for an accuracy of 81%.

Projects

*

Flagged Post Analysis: Stackoverflow

Analyzing key metrics associated with stackoverflow posts for classification of low quality posts. Identified 18 different textual, user based and code based features on a dataset of 96GB. Used LSTM encoder-decoder for labelling (semi-supervised) along with logistic regression for classification with an accuracy of 73%. Deployed the entire architecture on Spark (GCP dataproc) along with loading the dataset on Bigquery.

CodeNote: Snippet storage for developers

Developed a web application for easy access, storage and modification of code snippets. Features include searching (tag, comments, description), sharing of code snippets and linting for Python, C++ and Java. Deployed the app on AWS EC2, with continuous integration and deployment via Github Actions. Employed python-flask with S3, Dynamo DB and ElasticSearch for storage along with Cognito for 2FA.

Cancerous Cell Prediction

Identified cancerous regions within gigapixel pathology images to assist pathologists in identifying tumor cells. This was as part of Camelyon-16 Challenge. Used patching for generating a dataset from 21 gigapixel images, along with data augmentation at multiple zoom levels. Applied fine-tuned transfer learning with 2-tower InceptionV3 to achieve an accuracy of 96%.

TeleEasy Patient Portal

Created a distributed subscription-based teleconsultation platform deployed entirely on Amazon Web Services. Utilized the theory behind buffet economics to subsidize consultation costs for the patients. Employed best practices for security (2FA) and scalability (SQS, ElasticSearch and DynamoDB) for ensuring seamless digital experience.

E-Voting using Blockchain

Designed a scalable and secure web application for online voting based on blockchain and smart contracts. Created a custom blockchain, with proof-of-work consensus considering each state to be a miner as part of a P2P network. Blockchain synchronization across the miners was ensured by using Merkle Trees. Users could enquire about their vote, by using their private key so as to ensure security and transparency.

Voice controlled photo album

A scalable and secure web application for easy management and quick search of photos using text and voice inputs. The entire application was built on top of AWS, with the frontend hosted on S3. Quick search capabilities were provided by using elasticsearch, with authentication and authorization configured through AWS Cognito. The entire workflow was streamlined by using AWS Codepipeline for continuous integration and deployment.

Information Extraction from Unstructured Web Database

Created a system to fetch structured tuples from an unstructured corpus of documents on the internet. Utilised Iterative Set Expansion algorithm, which begins by providing a seed tuple to begin the process. We used spacy for named-entity tagging, and SpanBERT for relation identification from tokenized sentences.

Plant Species Identification using fusion techniques

Used transfer learning with Alexnet for identification of plant species on the basis of their leaves, stem and flower images. Improved accuracy by utilising various fusion techniques to combine the representations (obtained from images of leaves, stems and flowers individually), using sum, xor and multiplicative strategies.

Querius: Q/A App

Designed an android application for facilitating transparency between Professors and students in the university. The application used retrofit 2.0 for http(s) type safe conversion to java interface. Additionally, it used PHP, Mysql in the backend, with Java for the frontend. The application allowed students to subscribe to topics, post questions, upvote and share important announcements with peers.

Dining Concierge Chatbot

Designed an intelligent chatbot for providing dining recommendations based on the users preferences such as location, cuisine, number of guests, etc. The chatbot was designed using AWS Lex, with the application hosted on S3. The restaurant data was parsed from Yelp.

Contact