// Python + Yellowstone gRPC

Stream Solana data with Python.

Python remains the most popular language for data analysis, trading bots, and automation. This tutorial shows you how to connect to Yellowstone gRPC from Python using grpcio and protobuf — or skip the complexity entirely with Subglow's JSON output.

Why Python for Solana streaming

Python dominates the data science and quantitative trading landscape for good reason. Libraries like pandas, numpy, and scikit-learn give you a mature ecosystem for analyzing blockchain data the moment it arrives. If you're building a trading bot that uses machine learning to score new token launches, or an analytics pipeline that aggregates swap volumes across Raydium pools, Python lets you prototype and iterate faster than any compiled language. The REPL-driven development workflow means you can experiment with filter parameters and event processing logic interactively, without recompilation cycles.

The counterargument is performance. Python's Global Interpreter Lock (GIL) prevents true parallelism in CPU-bound workloads, and the interpreter overhead makes raw Python significantly slower than Rust or Go for tight processing loops. This matters if you're trying to process every transaction on Solana — the firehose produces 15,000+ events per second, and pure Python will struggle to keep up with deserialization at that rate. But here's the reality that most guides overlook: if you're using server-side filtering, your client only processes the subset of transactions that match your criteria. For a Pump.fun + Raydium filter, that's typically a few hundred events per second — well within Python's comfortable processing capacity.

The real bottleneck in any Solana trading pipeline is data delivery latency, not client-side processing speed. A Python bot that receives pre-parsed JSON events via gRPC in 5ms will outperform a Rust bot that receives raw protobuf in 5ms but then spends 50ms doing Borsh deserialization, account mapping, and instruction parsing locally. The language doesn't matter if the data arrives ready to consume. When the parsing happens at the source — as it does with Subglow — Python becomes a perfectly viable choice for latency-sensitive applications.

Yellowstone gRPC combined with Python gives you real-time blockchain data streamed into the language you already know. You keep your existing toolchain: pandas for analysis, asyncio for concurrent I/O, matplotlib or plotly for visualization, and the entire PyPI ecosystem for everything else. No need to learn Rust to get fast data. You just need the right infrastructure delivering it.

Install dependencies

Connecting to a raw Yellowstone gRPC endpoint from Python requires three packages: grpcio for the gRPC runtime, grpcio-tools for protobuf compilation, and protobuf for the generated message classes. Install them in a virtual environment to keep your project dependencies isolated.

terminal
$ pip install grpcio grpcio-tools protobuf
# Clone the Yellowstone proto definitions
$ git clone https://github.com/rpcpool/yellowstone-grpc.git
$ cd yellowstone-grpc/yellowstone-grpc-proto/proto
# Generate Python stubs from the proto files
$ python -m grpc_tools.protoc \
--proto_path=. \
--python_out=./gen \
--grpc_python_out=./gen \
geyser.proto

The protoc compiler reads the geyser.proto file and generates two Python modules: geyser_pb2.py containing the message classes (SubscribeRequest, SubscribeUpdate, etc.) and geyser_pb2_grpc.py containing the client stub. These generated files handle all the protobuf serialization and deserialization — you import them and call methods like any other Python module.

If you're using Subglow instead of a raw Yellowstone endpoint, you can skip the protobuf generation step entirely. Subglow delivers pre-parsed JSON over standard gRPC, so you only need grpcio — no proto files, no code generation, no compilation. This eliminates the most error-prone part of the setup and means your Python project stays clean of generated code.

Connect and subscribe

With the generated stubs in place, you can establish a gRPC connection and start streaming Solana transactions. The pattern follows standard gRPC conventions: create a channel with SSL credentials, instantiate the client stub, build a subscribe request with your desired filters, and iterate the response stream. Here's a complete working example that subscribes to all transactions and prints each update as it arrives.

stream.py
import grpc
from gen import geyser_pb2, geyser_pb2_grpc
# Establish an SSL-secured gRPC channel
creds = grpc.ssl_channel_credentials()
channel = grpc.secure_channel(
"grpc.subglow.io:443", creds
)
# Create the GeyserClient stub
stub = geyser_pb2_grpc.GeyserStub(channel)
# Build the subscription request
request = geyser_pb2.SubscribeRequest(
transactions={
"txn_filter": geyser_pb2.SubscribeRequestFilterTransactions(
vote=False,
failed=False,
)
}
)
# Authenticate and open the stream
metadata = [("x-api-key", "YOUR_SUBGLOW_KEY")]
for update in stub.Subscribe(request, metadata=metadata):
if update.HasField("transaction"):
tx = update.transaction
print(f"Slot {tx.slot}{tx.transaction.signatures[0].hex()}")

Let's break this down. The ssl_channel_credentials() call creates TLS credentials using the system's root certificate store — this ensures your connection to the gRPC endpoint is encrypted. The secure_channel opens an HTTP/2 connection to the server, which is reused for all subsequent RPC calls on this channel. The stub is a thin wrapper that maps Python method calls to gRPC service methods defined in the proto file.

The SubscribeRequest is where you define what data you want. The transactions field takes a dictionary of named filters — each key is an arbitrary label, and the value is a SubscribeRequestFilterTransactions specifying which transactions to include. Setting vote=False and failed=False excludes validator vote transactions and failed transactions, which together account for the majority of Solana's raw throughput but are irrelevant for most trading applications.

The stub.Subscribe() call returns a generator that yields SubscribeUpdate messages. Each update can contain a transaction, an account update, a slot notification, or other event types. The HasField check ensures you only process the event types you care about. This iterator blocks until the next event arrives, making it a natural fit for Python's synchronous programming model.

Filter transactions by program

The real power of Yellowstone gRPC is filtering. Instead of receiving every transaction on Solana and discarding 99% of them locally, you tell the server exactly which programs you care about. Here's how to target specific programs like Pump.fun.

01

Build the transaction filter

Create a SubscribeRequestFilterTransactions with the account_include field set to the Program IDs you want to track. Every transaction that touches these programs will be forwarded to your stream. For Pump.fun, the program address is 6EF8rrecthR5Dkzon8Nwu78hRvfCKubJ14M5uBEwF6P.

02

Exclude noise

Set vote=False and failed=False on your filter to drop validator vote transactions and failed transactions. Vote transactions alone account for roughly 50% of Solana's throughput and are never relevant for DeFi applications. Failed transactions add noise without actionable data — excluding both dramatically reduces your processing load.

03

Parse the transaction data

Each incoming transaction contains the list of account keys, the compiled instructions (with program index, account indices, and instruction data), and the transaction log messages. For raw Yellowstone, you'll need to decode the instruction data using the program's Borsh schema to extract fields like token amounts, buyer addresses, and bonding curve percentages.

filter_pump_fun.py
PUMP_FUN = "6EF8rrecthR5Dkzon8Nwu78hRvfCKubJ14M5uBEwF6P"
request = geyser_pb2.SubscribeRequest(
transactions={
"pump_fun": geyser_pb2.SubscribeRequestFilterTransactions(
account_include=[PUMP_FUN],
vote=False,
failed=False,
)
}
)
for update in stub.Subscribe(request, metadata=metadata):
if update.HasField("transaction"):
tx = update.transaction.transaction
sig = tx.signatures[0].hex()
accounts = [k.hex() for k in tx.message.account_keys]
print(f"Pump.fun tx: {sig[:16]}... accounts: {len(accounts)}")

For a complete reference of all available filter parameters and combinations, see the Yellowstone gRPC filter guide.

Handle disconnections and errors

gRPC streams are long-lived connections, and long-lived connections will eventually drop. Network partitions, server maintenance windows, load balancer rotations, and transient DNS failures all cause disconnections. A production Python bot that doesn't implement reconnection logic will silently stop receiving data after the first interruption — and silent failures in trading infrastructure are expensive failures.

The standard pattern is a reconnection wrapper with exponential backoff. When the stream drops, you wait a short interval, then attempt to reconnect. If the reconnection fails, you double the wait interval up to a maximum, preventing your client from hammering the server during an outage. You should also track the last processed slot number so that after reconnection you can detect and handle any gap in the data you received.

reconnect.py
import time, grpc, logging
log = logging.getLogger("stream")
MAX_BACKOFF = 30
last_slot = 0
def stream_with_reconnect(stub, request, metadata):
backoff = 1
global last_slot
while True:
try:
for update in stub.Subscribe(request, metadata=metadata):
backoff = 1
if update.HasField("transaction"):
last_slot = update.transaction.slot
process_transaction(update.transaction)
except grpc.RpcError as e:
log.warning(f"Stream disconnected: {e.code()} — reconnecting in {backoff}s")
time.sleep(backoff)
backoff = min(backoff * 2, MAX_BACKOFF)

The backoff variable resets to 1 second on every successful event, so transient hiccups recover quickly. When a disconnection occurs, the except block logs the gRPC status code — distinguishing UNAVAILABLE (transient) from UNAUTHENTICATED (permanent) helps you decide whether to retry or exit. Tracking last_slot gives you a checkpoint: if after reconnecting the stream resumes from a slot significantly higher than your last processed slot, you know you've missed events and can take appropriate action (querying a backfill API, logging the gap, or alerting).

For async Python workflows, replace time.sleep() with await asyncio.sleep() and use grpcio's async API (grpc.aio) for non-blocking stream iteration. This lets you run multiple streams concurrently — for example, one for Pump.fun events and another for Raydium pool creates — without blocking either.

Skip the complexity with Subglow

The raw Yellowstone approach works, but it comes with significant operational overhead. You need to compile protobuf stubs, keep them in sync with upstream proto file changes, write Borsh deserializers for every program you want to decode, map account indices to their public keys, parse inner instructions for CPI calls, and handle the dozens of edge cases that arise when decoding real-world Solana transactions. For a single program like Pump.fun, this can take days of development. For multiple programs, it's weeks.

Subglow delivers pre-parsed JSON events over the same gRPC protocol. The entire deserialization, account resolution, and instruction parsing pipeline runs server-side. Your Python client connects with standard grpcio, authenticates with an API key, and receives structured events that are immediately ready for your trading logic. No proto files, no Borsh, no account mapping — just JSON with labeled fields and native Python types.

subglow_stream.py
import grpc, json
channel = grpc.secure_channel("grpc.subglow.io:443", grpc.ssl_channel_credentials())
stub = SubglowStub(channel)
metadata = [("x-api-key", "YOUR_SUBGLOW_KEY")]
for event in stub.Subscribe(
SubscribeRequest(filters=["pump_fun", "raydium"]),
metadata=metadata
):
data = json.loads(event.parsed)
print(f"{data['program']} {data['type']}{data['signature'][:16]}...")

Ten lines. No generated code, no proto compilation, no Borsh decoding. The event.parsed field contains a JSON string with all the structured data — token amounts, buyer addresses, bonding curve percentages, pool reserves — already extracted and labeled. You parse it with Python's built-in json.loads() and feed it directly into your pandas DataFrame, your trading strategy, or your alerting pipeline.

This is the trade-off that makes Python viable for latency-sensitive Solana applications. The parsing that would take your Python client tens of milliseconds happens on Subglow's Rust infrastructure in microseconds. Your bot receives ready-to-use data and starts executing immediately. For pricing and tier details, see the plans page.

Frequently asked questions

Can I use Python with Yellowstone gRPC?

Yes. Install grpcio and generate Python stubs from the Yellowstone proto files. Or use Subglow which delivers JSON, eliminating the need for protobuf.

Is Python fast enough for Solana trading bots?

For most use cases, yes. The bottleneck is data latency, not processing speed. Python with Yellowstone gRPC delivers data in ~5ms. Processing each event takes microseconds even in Python.

What Python libraries do I need?

grpcio, grpcio-tools, and protobuf for raw Yellowstone. If using Subglow, just grpcio.

Does Subglow support Python?

Yes. Subglow works with any gRPC client. Python connects via grpcio and receives pre-parsed JSON events.

Python + real-time Solana data.

Pre-parsed JSON over gRPC. No protobuf compilation. No Borsh decoding. Just pip install and go.