Projects with this topic
-
Configuration and data workflows for an instance of Apache Airflow for the DDRplatform
Updated -
This is a study project built using FastAPI to practice microservice architecture, data normalization techniques, and clean API design.
The service receives raw payloads from different simulated sources and transforms them into a standardized and validated structure.
It centralizes normalization logic and demonstrates how to build a scalable, maintainable, and test-friendly data processing layer.
Updated -
Solución end-to-end para la migración y análisis de datos utilizando Python, FastAPI, Kafka y PostgreSQL. Implementa un pipeline de datos asíncrono y una API RESTful para analíticas, todo completamente containerizado con Docker Compose para un despliegue fácil y reproducible.
Updated -
This project/library contains common elements related to ETL processes...
Updated -
Unified project demonstrating both batch analytics and real-time streaming pipelines with Apache Spark:
Batch (PySpark/Jupyter): Processed S&P 500 stock data, applied transformations, and ran distributed computations.
Streaming (Spark + Kafka): Built a streaming pipeline to consume Kafka topics, process messages in real-time, and visualize outputs.
Deployed using Docker and Jupyter for reproducibility.
Updated -
Analyzed decades of historical weather station data (1920–1940) using Hadoop MapReduce. Filtered operable stations, computed descriptive statistics (min, max, mean, median), and produced reports/graphs. Designed modular MRJobs to chain tasks together for scalable processing.
Updated -
Advanced data synchronization framework.
Updated -
Reporting for MIT Club of Northern California
Updated -
В данном проекте находятся два задания, написанные на Python и реализующие выполнение цепочек задач (DAG) в среде Airflow
Updated -
-
Official weather data ETL for wind energy project evaluation in El Calafate, Argentina.
Updated -
target-core is a Singer Target which intend to work with regular Singer Tap. The Goal is to use this package as a foundation to build other targets focusing on the core features, reducing the energy spent on maintaining the common parts.
Updated -
tap-core is a Singer Tap which intend to work with regular Singer Targets. The Goal is to use this package as a foundation to build other taps focusing on the core features, reducing the energy spent on maintaining the common parts.
Updated -
Quick and dirty pipeline to prepare a dataset for mobility studies, starting from a collection of CDRs.
Updated -
Live data source that can be used for data engineering, data warehouse and etl development.
UpdatedUpdated -
A Python extract, transform, load (ETL) pipeline for the CityPulse Smart City dataset.
Updated -
Amaxa is a new data loader and ETL (extract-transform-load) tool for Salesforce, designed to support the extraction and loading of complex networks of records in a single operation.
Primary repo now on GitHub: https://github.com/davidmreed/amaxa
Updated -
Beetle aims to be a comprehensive ETL solution for transforming NOSQL data to relational structure.
Updated -
Process responsible for load data from devops to SSPO.
Updated