Balasiva Pindra
Available for opportunities

Balasiva Pindra

Sr. Software Engineer (Big Data)|McKinney, TX

scroll

What I Do

I specialize in building data infrastructure that processes at scale, stays up, and actually ships.

Real-Time Streaming

Designing and operating production streaming pipelines with Kafka, Flink, and Redpanda — processing millions of events per second with exactly-once semantics.

KafkaFlinkRedpandaMSKWebSocket

Data Platform Architecture

Building enterprise data platforms from the ground up — medallion architectures, lakehouse patterns, and warehouse modeling that scale to thousands of tables.

DatabricksSnowflakeBigQueryDelta Lakedbt

Batch ETL & Processing

Engineering large-scale batch processing with Spark and PySpark — data quality frameworks, schema enforcement, and cost-optimized pipelines on EMR and Databricks.

PySparkEMRAirflowGlueTerraform

AI & RAG Systems

Building production RAG pipelines and AI-powered search — embedding generation, vector databases, semantic retrieval, and LLM re-ranking for enterprise data.

HuggingFaceOpenSearchLangChainRAGLLMs

About Me

I build data infrastructure that doesn't go down. From 500 TB real-time crypto pipelines across 35 exchanges to enterprise data platforms processing millions of events — data engineering is what I do. I've designed end-to-end batch and streaming architectures on Kafka, Flink, Spark, and Databricks at companies like Rivian, Capital One, and Amberdata. I also shipped a RAG-powered search engine and a full SaaS product solo. I own outcomes, not tickets.

0+
Years Experience
0 TB
Data Processed
0
Live Exchanges
0.00%
Uptime
0+
Tables Built

Tech Stack

Languages

PythonGoJavaTypeScriptSQL

Data & Streaming

KafkaFlinkSparkDatabricksdbtDelta Lake

Cloud & Infra

AWSKubernetesDockerTerraformPulumi

AI & Search

LangChainRAGOpenSearchHuggingFaceLLMs

Databases

SnowflakeBigQueryRedshiftPostgreSQLRedis

Experience

Sr. Software Engineer

Amberdata

Oct 2024 – Present
Remote
GoPythonKafkaSpark ScalaDatabricksKubernetesDelta Lake
  • Engineered real-time WebSocket ingestion pipelines across 35 crypto exchanges — sustaining 99.99% uptime and processing 500+ TB of live trade, order book, and ticker data.
  • Owned full migration of legacy JavaScript data-API to Go — zero-downtime cutover with instant rollback capability.
  • Built a Spark Scala data quality framework enforcing custom business rules across TB-scale daily loads.
  • Drove K8s cluster infrastructure decisions — namespace strategy, autoscaling, and CI/CD pipelines.
- collapse

Sr. Software Engineer

Trepp, Inc.

Apr 2024 – Oct 2024
Dallas, TX
PythonHuggingFaceOpenSearchRAGLLMAWS
+ details

Lead Data Engineer

Rivian

Jan 2020 – Apr 2024
Palo Alto, CA
PySparkKafkaFlinkDatabricksBigQuerydbtTerraformAWS
+ details

Sr. Big Data Engineer

Capital One

Mar 2018 – Jan 2020
Richmond, VA
JavaKafkaSparkSnowflakeECSPythonZeroMQ
+ details

Data Engineer

Logipro Software

Dec 2014 – Dec 2016
Hyderabad, India
InformaticaHadoopSparkPythonSQL Server
+ details

Projects

Professional work, SaaS products, and open source contributions.

professional

Crypto Market Data Infrastructure

Real-time WebSocket ingestion across 35 exchanges, 500TB of trade data at 99.99% uptime. Migrated legacy JS API to Go with zero-downtime cutover.

GoPythonKafkaSpark ScalaKubernetesDelta Lake
professional

RAG Property Search Engine

Replaced keyword search with intent-aware RAG pipeline — HuggingFace embeddings, OpenSearch vector DB, LLM re-ranking. Shipped in 7 months.

PythonHuggingFaceOpenSearchRAGLLMAWS
professional

Rivian Data Platform

Built from scratch: 1,000+ tables, 100+ pipelines, Kafka/Flink streaming. Cut processing time 30%, infra costs 25%.

PySparkKafkaFlinkDatabricksBigQuerydbtAWS
saas

HostMetrics

Full-stack SaaS for Turo fleet analytics — 28 page routes, 200+ React components, Stripe billing, AWS Lambda background jobs. Built solo, live in production.

Next.jsTypeScriptSupabaseStripeTailwindVercel
open source

Real-Time Streaming Pipeline

Confluent Kafka + Apache Flink + Python producers/consumers with Avro/Protobuf serialization on AWS MSK.

PythonKafkaFlinkAWS MSKTerraform
open source

Spark Batch Processing

PySpark ETL pipeline with medallion architecture, data quality framework, and Airflow scheduling on AWS EMR.

PySparkAirflowAWS EMRS3Glue
open source

Databricks Delta Lakehouse

Delta Lake with medallion architecture, Auto Loader, SCD Type 2, and Delta Live Tables on AWS.

DatabricksDelta LakePySparkDLTTerraform
open source

Snowflake dbt Warehouse

dbt models with star schema, incremental strategies, snapshots, and Snowpipe ingestion from S3.

SnowflakedbtSQLTerraformAWS S3
open source

Confluent Kafka Ecosystem

Schema Registry, Avro/Protobuf, KSQL, Connect, transactional exactly-once patterns on AWS MSK.

KafkaSchema RegistryKSQLConnectPython
open source

Python Data API

FastAPI + Flask REST APIs with Kafka consumers, JWT auth, rate limiting, and WebSocket streaming on AWS ECS.

FastAPIFlaskKafkaPostgreSQLDockerAWS ECS
open source

Dimensional Data Modeling

Star/snowflake schemas, SCD patterns, and ERDs for ecommerce, healthcare, fintech, and supply chain domains.

SQLPostgreSQLRedshiftKimballTerraform
open source

SQL Masterclass

50+ SQL problems from basics to advanced — window functions, CTEs, optimization, and real-world scenarios.

PostgreSQLSQLAthenaWindow FunctionsCTEs

Let's Connect

Looking for a senior data engineer who ships? Send me a message.

Built with Next.js, Tailwind CSS & Framer Motion

© 2026 Balasiva Pindra