S
spark

Projects with this topic

View DataRider ETL with Spark project

DataRider v2 / DataRider ETL with Spark

DataRider bloc for ETL Stream with Spark+Scala

ETL spark

0

Updated Dec 18, 2025

0 0 1 12

Updated Dec 18, 2025
View Spark for Batch + Streaming - Market Analysis Kafka Pipeline project

Cristian Vasu Data Portfolio / Spark for Batch + Streaming - Market Analysis Kafka Pipeline

Unified project demonstrating both batch analytics and real-time streaming pipelines with Apache Spark:

Batch (PySpark/Jupyter): Processed S&P 500 stock data, applied transformations, and ran distributed computations.

Streaming (Spark + Kafka): Built a streaming pipeline to consume Kafka topics, process messages in real-time, and visualize outputs.

Deployed using Docker and Jupyter for reproducibility.

spark pyspark kafka streaming ETL real-time batch-proces... Docker data-pipeline

1

Updated Sep 04, 2025

1 0 0 0

Updated Sep 04, 2025
View spark project

ufscar / hpc / Exemplos / spark

Apache-Spark with Master-Slave setup to work out of the box using OpenHPC and Slurm

spark apache-spark slurm HPC openhpc

1

Updated Jun 24, 2023

1 1 0 0

Updated Jun 24, 2023
View containalytics project

containalytics / containalytics

"Cloud container data analytics, statistical modeling, and machine learning on distributed databases". "A free opensource alternative to SPSS, SAS, MATLAB, PowerBI, Tableau and Alteryx". Runs on Linux, Windows, MacOS, and in the cloud via containers.

LaTeX statistics sas spss matlab Python R spark cloud gcp Oracle azure Amazon Web S... Kubernetes containers Docker ML machine lear... regression clustering TiDB Yugabyte MySQL MariaDB SQL sparkr pyspark RStudio - KNIME Anal... Apache Spark... PyTorch MXNet Chainer keras gluon Scikit-learn... ONNX MLOps - Anac... NumPy Ipython) StatsModels pytest dask Koalas API -... Tornado - Py... Altair Bokeh Jupyter Voila Plotly/Dash matplotlib Seaborn - C#... SASPy - R: T... ggplot2 shiny dash Sparklyr BlueSky Stat... Jamovi - Int... vs code Vim - Tableau TabPy Tableau Buil... Python) - PL... SQL Developer PostgreSQL MySQL/MariaDB pgAdmin4 dbeaver MySQL Workbench Spark SQL Delta Lake Angular 2+ React .NET Core JavaScript (JS) Typescript (TS) Blazor Razor html5 CSS3 AWS EC2 Servers docker-compose podman Red Hat Ente... Oracle Linux fedora centos Ubuntu (WSL 2) debian Kestrel nginx Apache web s... jira Git Gitlab CI/CD... Code Climate... Ansible helm Terraform Cloudera Dat... nifi blender godot MS Office

2

Updated May 11, 2022

2 0 1

Updated May 11, 2022
View sparkconf-app project

Matteo / sparkconf-app

This web app finds the best configuration of a Spark Application given the hardware of the cluster

spark hadoop

0

Updated Oct 22, 2020

0 0 0 0

Updated Oct 22, 2020
View Stack Exchange Data Processing with Spark project

siddie / Stack Exchange Data Processing with Spark

Stack Exchange releases "data dumps" of all its publicly available content roughly every three months via archive.org.

This project is an example and a framework for building ETL for this data with Apache Spark and Java.

spark big data stackexchange stack exchange Apache Spark big data ETL ETL

0

Updated Jun 20, 2019

0 0

Updated Jun 20, 2019
View documents-cluster project

Luis Miguel Mejía Suárez / documents-cluster

This repo presents performance comparisons between a serial implementation, a MPI based and a Spark based implementation of a document clustering algorithm

bigdata HPC MPI spark Python Scala kmeans clustering

0

Updated Jun 07, 2017

0 0 0 0

Updated Jun 07, 2017
View trice project

thomas lörtsch / trice

analytics suite for 'Tor Project' metrics data

tor metrics analytics spark Zeppelin

0

Updated Jan 25, 2017

0 0 1 0

Updated Jan 25, 2017