Projects with this topic
-
Projet de référence d'une architecture Lakehouse moderne appliquée à la détection de fraude bancaire.
Simule un environnement de production avec trois sources de données hétérogènes (fichiers CSV, base PostgreSQL, streaming Kafka/Redpanda) ingérées en continu vers un stockage objet S3-compatible (MinIO).
Stack technique :
Ingestion batch : Apache Spark (PySpark) + Delta Lake Ingestion streaming : Spark Structured Streaming + Redpanda (Kafka) Orchestration : Apache Airflow Transformation : dbt (DuckDB) Stockage : MinIO (S3), Delta Lake (Bronze/Silver), Parquet (Gold) Exploration : DuckDB / DBeaverArchitecture en médaillon (Medallion Architecture) :
Bronze : données brutes, sources séparées Silver : données nettoyées, déduplication inter-sources Gold : agrégats métier (fraude par heure)L'ensemble de la stack tourne en local via Docker Compose.
Updated -
Utilized Airflow and Selenium to scrape weather forecast and send email alerts regarding extreme weather. Thresholds for extreme weather are defined in an external settings (yml) file. Quarto was used for documentation: https://avivfaraj.gitlab.io/weather-alerts/ .
Updated -
-
-
В данном проекте находятся два задания, написанные на Python и реализующие выполнение цепочек задач (DAG) в среде Airflow
Updated -
В данном репозитории находятся два проекта, демонстрирующие работу c данными в Python и на SQL, а также использование специализированных библиотек для статистических расчетов и визуализации данных, в Jupiter Notebook.
Updated -
-
DluOne Data Composer for Data Engineering and Data Science
This is a repo with scripts for Data Science and Engineering quick start with docker
I'm using personally for my daily work and side projects
Updated -
A collection of Airflow operators, hooks, and utilities to elevate dbt to a first-class citizen of Airflow.
See the documentation at: https://airflow-dbt-python.readthedocs.io/
Updated -
project for module 33.Airflow in the course "Basic Data Science" at Skillbox.
Updated -
Airflow pipeline, mainly for exploration and self-learning, but particularly for scraping LINE Webtoon data and store it in external MySQL database. The ingested data are used to create dashboard on https://ammarchalifah.com/webtoon-insights
Updated -
-
Airflow test for VLGI Investimentos.
Updated -
Airflow data processing with Ansible and LXD
Updated