Available for opportunities

Balasiva Pindra

Sr. Software Engineer (Big Data)|McKinney, TX

scroll

What I Do

I specialize in building data infrastructure that processes at scale, stays up, and actually ships.

Real-Time Streaming

Designing and operating production streaming pipelines with Kafka, Flink, and Redpanda — processing millions of events per second with exactly-once semantics.

KafkaFlinkRedpandaMSKWebSocket

Data Platform Architecture

Building enterprise data platforms from the ground up — medallion architectures, lakehouse patterns, and warehouse modeling that scale to thousands of tables.

DatabricksSnowflakeBigQueryDelta Lakedbt

Batch ETL & Processing

Engineering large-scale batch processing with Spark and PySpark — data quality frameworks, schema enforcement, and cost-optimized pipelines on EMR and Databricks.

PySparkEMRAirflowGlueTerraform

AI & RAG Systems

Building production RAG pipelines and AI-powered search — embedding generation, vector databases, semantic retrieval, and LLM re-ranking for enterprise data.

HuggingFaceOpenSearchLangChainRAGLLMs

About Me

I build data infrastructure that doesn't go down. From 500 TB real-time crypto pipelines across 35 exchanges to enterprise data platforms processing millions of events — data engineering is what I do. I've designed end-to-end batch and streaming architectures on Kafka, Flink, Spark, and Databricks at companies like Rivian, Capital One, and Amberdata. I also shipped a RAG-powered search engine and a full SaaS product solo. I own outcomes, not tickets.

Years Experience

0 TB

Data Processed

Live Exchanges

0.00%

Uptime

Tables Built

Tech Stack

Languages

PythonGoJavaTypeScriptSQL

Data & Streaming

KafkaFlinkSparkDatabricksdbtDelta Lake

Cloud & Infra

AWSKubernetesDockerTerraformPulumi

AI & Search

LangChainRAGOpenSearchHuggingFaceLLMs

Databases

SnowflakeBigQueryRedshiftPostgreSQLRedis

Experience

Sr. Software Engineer

Amberdata

Oct 2024 – Present

Remote

GoPythonKafkaSpark ScalaDatabricksKubernetesDelta Lake

Engineered real-time WebSocket ingestion pipelines across 35 crypto exchanges — sustaining 99.99% uptime and processing 500+ TB of live trade, order book, and ticker data.
Owned full migration of legacy JavaScript data-API to Go — zero-downtime cutover with instant rollback capability.
Built a Spark Scala data quality framework enforcing custom business rules across TB-scale daily loads.
Drove K8s cluster infrastructure decisions — namespace strategy, autoscaling, and CI/CD pipelines.

- collapse

Sr. Software Engineer

Trepp, Inc.

Apr 2024 – Oct 2024

Dallas, TX

PythonHuggingFaceOpenSearchRAGLLMAWS

+ details

Lead Data Engineer

Rivian

Jan 2020 – Apr 2024

Palo Alto, CA

PySparkKafkaFlinkDatabricksBigQuerydbtTerraformAWS

+ details

Sr. Big Data Engineer

Capital One

Mar 2018 – Jan 2020

Richmond, VA

JavaKafkaSparkSnowflakeECSPythonZeroMQ

+ details

Data Engineer

Logipro Software

Dec 2014 – Dec 2016

Hyderabad, India

InformaticaHadoopSparkPythonSQL Server

+ details

Projects

Professional work, SaaS products, and open source contributions.

professional

Crypto Market Data Infrastructure

Real-time WebSocket ingestion across 35 exchanges, 500TB of trade data at 99.99% uptime. Migrated legacy JS API to Go with zero-downtime cutover.

GoPythonKafkaSpark ScalaKubernetesDelta Lake

professional

RAG Property Search Engine

Replaced keyword search with intent-aware RAG pipeline — HuggingFace embeddings, OpenSearch vector DB, LLM re-ranking. Shipped in 7 months.

PythonHuggingFaceOpenSearchRAGLLMAWS

professional

Rivian Data Platform

Built from scratch: 1,000+ tables, 100+ pipelines, Kafka/Flink streaming. Cut processing time 30%, infra costs 25%.

PySparkKafkaFlinkDatabricksBigQuerydbtAWS

saas

HostMetrics

Full-stack SaaS for Turo fleet analytics — 28 page routes, 200+ React components, Stripe billing, AWS Lambda background jobs. Built solo, live in production.

Next.jsTypeScriptSupabaseStripeTailwindVercel

open source