Skip to content

DataFrame Backends

xbbg is Arrow-native internally. Endpoint helpers return a Narwhals DataFrame by default. When PyArrow is installed, that Narwhals frame is backed by a real pyarrow.Table to preserve the legacy Narwhals/PyArrow behavior; otherwise xbbg falls back through installed dataframe libraries and finally the minimal xbbg.ArrowTable native carrier from xbbg._core.

Use backend="native" / Backend.NATIVE when you want the raw xbbg.ArrowTable carrier, or backend="pyarrow" / Backend.PYARROW for an actual pyarrow.Table. If default Narwhals output has to fall back to the limited native plugin because no PyArrow/pandas/Polars backend is installed, xbbg emits a one-time RuntimeWarning with install guidance.

Eager backends return a fully materialized object immediately.

BackendOutput typeBest for
default / narwhalsNarwhals DataFrameBackwards-compatible dataframe ergonomics over PyArrow when installed, with a warned native fallback for minimal installs
nativexbbg.ArrowTableExplicit native carrier, PyCapsule interop, memory-conscious workflows
pyarrowpyarrow.TableFull PyArrow table functionality and Arrow ecosystem integrations
pandaspd.DataFrameTraditional workflows, ecosystem compatibility
polarspl.DataFrameHigh performance, large datasets
modinModin DataFramePandas API with parallel execution
cudfcuDF DataFrameGPU-accelerated processing (requires NVIDIA)

Lazy backends defer execution. The query graph is built when you call xbbg functions and evaluated only when you explicitly trigger execution (e.g. .collect() for Polars, .execute() for DuckDB).

BackendOutput typeBest for
polars_lazypl.LazyFrameDeferred execution, query optimization
narwhals_lazyNarwhals LazyFrameLibrary-agnostic lazy evaluation through xbbg’s native plugin
duckdbDuckDB relationSQL analytics, OLAP queries
daskDask DataFrameOut-of-core and distributed computing
ibisIbis TableUnified interface to many backends
pysparkSpark DataFrameBig data processing (requires Java)
sqlframeSQLFrame DataFrameSQL-first DataFrame operations

Backend.NARWHALS is already the default. Set it explicitly only when you want to restore the default dataframe wrapper after using another backend.

import xbbg
from xbbg import Backend, blp
xbbg.set_backend(Backend.NARWHALS)
# Calls now return Narwhals DataFrame
df = blp.bdp('AAPL US Equity', 'PX_LAST')
print(type(df))

You can also pass a string:

xbbg.set_backend('narwhals')

Pass backend as a keyword argument to any data function. This overrides the global default for that call only.

from xbbg import blp
# Native carrier result for this call
table = blp.bdp('AAPL US Equity', 'PX_LAST', backend='native')
# Explicit conversion when a library object is required
df = blp.bdp('AAPL US Equity', 'PX_LAST', backend='pandas')

Not all optional backends are installed in every environment. Use these utilities to inspect what is available before writing code that assumes a specific backend. The default narwhals backend and explicit native carrier are provided by xbbg and are always available.

from xbbg import get_available_backends, is_backend_available, print_backend_status
# Print installed backend names
print([backend.value for backend in get_available_backends()])
# ['native', 'narwhals', 'pyarrow', 'pandas', 'polars', ...]
# Check a specific backend
if is_backend_available('polars'):
print("Polars is installed")
# Print a detailed status table for all backends
print_backend_status()
from xbbg import blp, Backend
table = blp.bdh(
'SPX Index',
'PX_LAST',
start_date='2024-01-01',
end_date='2024-12-31',
backend=Backend.NATIVE,
)
# Returns xbbg.ArrowTable
print(table.column_names)
print(table.to_pylist()[:1])

xbbg.ArrowTable implements the Arrow PyCapsule stream protocol (__arrow_c_stream__) and schema protocol (__arrow_c_schema__). xbbg.ArrowRecordBatch implements the Arrow PyCapsule array protocol (__arrow_c_array__) for raw streaming batches. Arrow-aware libraries can consume these objects directly without xbbg exposing a separate public backend for that library’s Arrow table type.

from xbbg import blp
table = blp.bdp('AAPL US Equity', 'PX_LAST', backend='native')
import polars as pl
polars_df = pl.from_arrow(table)
import duckdb
relation = duckdb.from_arrow(table)

Prefer backend='native' when downstream code can consume xbbg’s native carrier or the Arrow PyCapsule protocol directly. Request backend='pyarrow', backend='pandas', backend='polars', backend='duckdb', or another explicit backend when you want xbbg to materialize that library’s object at the boundary.

Default / Narwhals is the compatibility path. It keeps dataframe-style behavior over xbbg’s native carrier without requiring pandas or PyArrow as hard dependencies. df.to_pandas() imports pandas only when you call it.

Native carrier (native) is the lowest-overhead explicit result. Use it when you will inspect columns, call to_pylist(), split into to_batches(), or hand data to an Arrow-aware library through the PyCapsule protocol.

PyArrow is the full Apache Arrow Python table backend. Use backend='pyarrow' when code expects pyarrow.Table methods or PyArrow ecosystem behavior.

Polars is the best choice for pure computation on large datasets. Its columnar engine and lazy execution model handle datasets that would be slow or impractical in pandas. The polars_lazy backend lets you chain additional query steps before triggering evaluation.

DuckDB is useful when your next step is SQL analytics. Use backend='duckdb' for a relation, or pass the native carrier to DuckDB yourself when you want to control connection scope.

pandas remains the widest-compatibility option. Use it when integrating with libraries that only accept pd.DataFrame, or when working with existing pandas-based pipelines.

Lazy backends (DuckDB, Polars lazy, Dask, etc.) are useful when you want to compose queries across multiple xbbg calls before materializing any data, or when the result set is too large to hold in memory.

  • Output Formats — control the shape of returned data (LONG, LONG_TYPED, etc.)
  • API Reference — full function documentation with all parameters