DataFrame Backends
xbbg is Arrow-native internally. Endpoint helpers return a Narwhals DataFrame by default. When PyArrow is installed, that Narwhals frame is backed by a real pyarrow.Table to preserve the legacy Narwhals/PyArrow behavior; otherwise xbbg falls back through installed dataframe libraries and finally the minimal xbbg.ArrowTable native carrier from xbbg._core.
Use backend="native" / Backend.NATIVE when you want the raw xbbg.ArrowTable carrier, or backend="pyarrow" / Backend.PYARROW for an actual pyarrow.Table. If default Narwhals output has to fall back to the limited native plugin because no PyArrow/pandas/Polars backend is installed, xbbg emits a one-time RuntimeWarning with install guidance.
Supported Backends
Section titled “Supported Backends”Eager Backends
Section titled “Eager Backends”Eager backends return a fully materialized object immediately.
| Backend | Output type | Best for |
|---|---|---|
default / narwhals | Narwhals DataFrame | Backwards-compatible dataframe ergonomics over PyArrow when installed, with a warned native fallback for minimal installs |
native | xbbg.ArrowTable | Explicit native carrier, PyCapsule interop, memory-conscious workflows |
pyarrow | pyarrow.Table | Full PyArrow table functionality and Arrow ecosystem integrations |
pandas | pd.DataFrame | Traditional workflows, ecosystem compatibility |
polars | pl.DataFrame | High performance, large datasets |
modin | Modin DataFrame | Pandas API with parallel execution |
cudf | cuDF DataFrame | GPU-accelerated processing (requires NVIDIA) |
Lazy Backends
Section titled “Lazy Backends”Lazy backends defer execution. The query graph is built when you call xbbg functions and evaluated only when you explicitly trigger execution (e.g. .collect() for Polars, .execute() for DuckDB).
| Backend | Output type | Best for |
|---|---|---|
polars_lazy | pl.LazyFrame | Deferred execution, query optimization |
narwhals_lazy | Narwhals LazyFrame | Library-agnostic lazy evaluation through xbbg’s native plugin |
duckdb | DuckDB relation | SQL analytics, OLAP queries |
dask | Dask DataFrame | Out-of-core and distributed computing |
ibis | Ibis Table | Unified interface to many backends |
pyspark | Spark DataFrame | Big data processing (requires Java) |
sqlframe | SQLFrame DataFrame | SQL-first DataFrame operations |
Selecting a Backend
Section titled “Selecting a Backend”Global default
Section titled “Global default”Backend.NARWHALS is already the default. Set it explicitly only when you want to restore the default dataframe wrapper after using another backend.
import xbbgfrom xbbg import Backend, blp
xbbg.set_backend(Backend.NARWHALS)
# Calls now return Narwhals DataFramedf = blp.bdp('AAPL US Equity', 'PX_LAST')print(type(df))You can also pass a string:
xbbg.set_backend('narwhals')Per-call override
Section titled “Per-call override”Pass backend as a keyword argument to any data function. This overrides the global default for that call only.
from xbbg import blp
# Native carrier result for this calltable = blp.bdp('AAPL US Equity', 'PX_LAST', backend='native')
# Explicit conversion when a library object is requireddf = blp.bdp('AAPL US Equity', 'PX_LAST', backend='pandas')Checking Availability
Section titled “Checking Availability”Not all optional backends are installed in every environment. Use these utilities to inspect what is available before writing code that assumes a specific backend. The default narwhals backend and explicit native carrier are provided by xbbg and are always available.
from xbbg import get_available_backends, is_backend_available, print_backend_status
# Print installed backend namesprint([backend.value for backend in get_available_backends()])# ['native', 'narwhals', 'pyarrow', 'pandas', 'polars', ...]
# Check a specific backendif is_backend_available('polars'): print("Polars is installed")
# Print a detailed status table for all backendsprint_backend_status()Backend Examples
Section titled “Backend Examples”from xbbg import blp, Backend
table = blp.bdh( 'SPX Index', 'PX_LAST', start_date='2024-01-01', end_date='2024-12-31', backend=Backend.NATIVE,)# Returns xbbg.ArrowTableprint(table.column_names)print(table.to_pylist()[:1])from xbbg import blp, Backend
table = blp.bdp('AAPL US Equity', 'PX_LAST', backend=Backend.PYARROW)# Returns pyarrow.Tableprint(type(table))from xbbg import blp, Backend
df = blp.bdp('AAPL US Equity', 'PX_LAST', backend=Backend.PANDAS)# Returns pd.DataFrameprint(type(df)) # <class 'pandas.core.frame.DataFrame'>from xbbg import blp, Backend
df = blp.bdp('AAPL US Equity', 'PX_LAST', backend=Backend.POLARS)# Returns pl.DataFrameprint(type(df)) # <class 'polars.dataframe.frame.DataFrame'>from xbbg import blp, Backend
relation = blp.bdh( 'SPX Index', 'PX_LAST', start_date='2024-01-01', end_date='2024-12-31', backend=Backend.DUCKDB,)# Returns a DuckDB relation — not yet executedresult = relation.fetchdf() # trigger execution, returns pd.DataFrameArrow PyCapsule Interop
Section titled “Arrow PyCapsule Interop”xbbg.ArrowTable implements the Arrow PyCapsule stream protocol (__arrow_c_stream__) and schema protocol (__arrow_c_schema__). xbbg.ArrowRecordBatch implements the Arrow PyCapsule array protocol (__arrow_c_array__) for raw streaming batches. Arrow-aware libraries can consume these objects directly without xbbg exposing a separate public backend for that library’s Arrow table type.
from xbbg import blp
table = blp.bdp('AAPL US Equity', 'PX_LAST', backend='native')
import polars as plpolars_df = pl.from_arrow(table)
import duckdbrelation = duckdb.from_arrow(table)Prefer backend='native' when downstream code can consume xbbg’s native carrier or the Arrow PyCapsule protocol directly. Request backend='pyarrow', backend='pandas', backend='polars', backend='duckdb', or another explicit backend when you want xbbg to materialize that library’s object at the boundary.
Performance Considerations
Section titled “Performance Considerations”Default / Narwhals is the compatibility path. It keeps dataframe-style behavior over xbbg’s native carrier without requiring pandas or PyArrow as hard dependencies. df.to_pandas() imports pandas only when you call it.
Native carrier (native) is the lowest-overhead explicit result. Use it when you will inspect columns, call to_pylist(), split into to_batches(), or hand data to an Arrow-aware library through the PyCapsule protocol.
PyArrow is the full Apache Arrow Python table backend. Use backend='pyarrow' when code expects pyarrow.Table methods or PyArrow ecosystem behavior.
Polars is the best choice for pure computation on large datasets. Its columnar engine and lazy execution model handle datasets that would be slow or impractical in pandas. The polars_lazy backend lets you chain additional query steps before triggering evaluation.
DuckDB is useful when your next step is SQL analytics. Use backend='duckdb' for a relation, or pass the native carrier to DuckDB yourself when you want to control connection scope.
pandas remains the widest-compatibility option. Use it when integrating with libraries that only accept pd.DataFrame, or when working with existing pandas-based pipelines.
Lazy backends (DuckDB, Polars lazy, Dask, etc.) are useful when you want to compose queries across multiple xbbg calls before materializing any data, or when the result set is too large to hold in memory.
Related
Section titled “Related”- Output Formats — control the shape of returned data (LONG, LONG_TYPED, etc.)
- API Reference — full function documentation with all parameters