Hello, I'm
Harshit Tripathi
Data Engineer
View My Portfolio
Who I Am and What I Do
Experienced IT professional with over 6 years of hands-on experience in Python development, data engineering, and implementing data strategy using Azure cloud tools such as Pyspark, Azure Databricks, Azure Data Factory, Azure Synapse Analytics, Azure Data Lake, Azure Blob Storage, Azure Analysis Service, Power BI, and SQL. Proficient in designing, developing, and maintaining ETL frameworks, data pipelines, and data integration solutions. Skilled in data transformation, optimizing database performance, ensuring data quality, and automating data processing tasks. Strong background in building complex data processing frameworks, real-time dashboards for strategic decision-making, and collaborating within Agile development frameworks.
My Technical Skills
Delta Lake
Azure Data Factory
Azure Analysis Services
Databricks
Function App
Azure Synapse Analytics
NoSQL
Power Bl
python
Bitbucket
Apache
Airflow
My Professional Journey
Current role
At Expleo Ltd, I serve as a Data Engineer for Rolls Royce, where I design, build, and maintain robust data pipelines using Azure Data Factory and SQL. I integrate data from diverse sources, ensuring accuracy and reliability. My role involves optimizing database performance, conducting performance tuning, and implementing automation with Python and Pandas. I create real-time dashboards with Power BI, manage Databricks workflows, and collaborate with global teams. Additionally, I ensure high data quality in all processes, contribute to data migration projects, and drive initiatives to automate processes and enhance scalability within an Agile framework.
Previous Roles in Data Engineering
At Websoft Technologies Ltd, working on-site for NHS from May 2022 to September 2023, I analyzed data quality, volume, and complexity to determine migration requirements. I designed and implemented ETL processes using Azure Databricks, facilitating efficient data extraction, transformation, and loading (ETL) into Azure-based data repositories. My role involved optimizing data loading processes for performance and scalability, implementing data pipelines to automate data movement, and ensuring high data quality standards through anomaly detection and data cleansing.
From August 2017 to September 2021, at Stepfinity Software Pvt Ltd, I worked for a Fintech client, migrating data from on-premises servers to Azure Data Lake using Azure Data Factory. I created schemas, facts, and dimensions using MS SQL, designed ETL processes and data models for Azure SQL Server, and developed Spark code using Scala and Spark-SQL/Streaming. Additionally, I built dashboards on Power BI and orchestrated ETL processes, enhancing data processing efficiency and supporting strategic decision-making.
Projects
Cost Optimization on Azure Cloud
Effective ways for organizations to optimize costs and address user/customer issues through analytics on both batch and streaming data on Azure cloud platform.
Institution Sales orders BI
Convert Migrating data from On Premise server to Azure Data Lake using Azure Data Factory and processed the received file using Scala, PySpark, Spark SQL in Databricks.
Full Delta load from On-Primeses to Cloud
Conducted a comprehensive analysis of data sources, business needs, and integration requirements to determine the optimal approach for implementing a full delta load process. Designed a robust architecture for full delta load processes using Azure services. Implemented data extraction pipelines and delta load processes, optimizing performance and resource consumption.
Get in Touch
You Tube Channel
Email: Harshit.herts@gmail.com
Mobile: +44 07824702731
Follow me on social media