Read

new Feature Store
Advanced Guide 2024

Hopsworks

Updated on:

August 7, 2023

The first open-source Feature Store and the first with a DataFrame API. Most data sources (batch/streaming) supported. Ingest features using SQL, Spark, Python, Flink. The only feature store supporting stream processing for writes. Available as managed platform and on-premises.

Company:

Hopsworks

Hopsworks Documentation

Vendor

Open source

On-Prem

Vertex AI

Updated on:

August 11, 2023

A centralized repository for organizing, storing, and serving ML features on the GCP Vertex platform. Vertex AI Feature Store supports BigQuery, GCS as data sources. Separate ingestion jobs after feature engineering in BigQuery. Offline is BigQuery, Online BigTable.

Company:

Google

Vertex AI Documentation

Vendor

SageMaker

Updated on:

August 11, 2023

Sagemaker Feature Store integrates with other AWS services like Redshift, S3 as data sources and Sagemaker serving. It has a feature registry UI in Sagemaker, and Python/SQL APIs. Online FS is Dynamo, offline parquet/S3.

Company:

Amazon

AWS Documentation

Vendor

Databricks

Updated on:

August 11, 2023

A Feature Store built around Spark Dataframes. Supports Spark/SQL for feature engineering with a UI in Databricks. Online FS is AWS RDS/MYSQL/Aurora. Offline is Delta Lake.

Company:

Databricks

Databricks Documentation

Vendor

Chronon

Updated on:

August 11, 2023

One of the first feature stores from 2018, orginially called Zipline and part of the Bighead ML Platform. Feature engineering is using a DSL that includes point-in-time correct training set backfills, scheduled updates, feature visualizations and automatic data quality monitoring.

Company:

AirBnB

Zipline: Machine Learning Data Management Platform

In-house

Michelangelo

Updated on:

August 11, 2023

The mother of feature stores. Michelangelo is an end-to-end ML platfom and Palette is the features store. Features are defined in a DSL that translates into Spark and Flink jobs. Online FS is Redis/Cassandra. Offline is Hive.

Company:

Uber

Uber's Machine Learning Platform

In-house

FBLearner

Updated on:

August 11, 2023

Internal end-to-end ML Facebook platform that includes a feature store. It provides innovative functionality, like automatic generation of UI experiences from pipeline definitions and automatic parallelization of Python code.

Company:

Facebook

FBLearner Flow: Facebook’s AI backbone

In-house

Overton

Updated on:

August 11, 2023

Internal end-to-end ML platform at Apple. It automates the life cycle of model construction, deployment, and monitoring by providing a set of novel high-level, declarative abstractions. It has been used in production to support multiple applications in both near-real-time applications and back-of-house processing.

Company:

Apple

Overton: A Data System for Monitoring and Improving Machine-Learned Products

In-house

Featureform

Updated on:

August 11, 2023

FeatureForm is a virtual feature store platfrom - you plug in your offline and online data stores. It supports Flink, Snowflake, Airflow Kafka, and other frameworks.

Company:

Featureform

Featureform Documentation

Vendor

On-Prem

Open source

Twitter

Updated on:

August 11, 2023

Twitter's first feature store was a set of shared feature libraries and metadata. Since then, they moved to building their own feature store, which they did by customizin feast for GCP.

Company:

Twitter

Twitter Engineering: Cortex

In-house

Feast

Updated on:

August 11, 2023

Originally developed as an open-source feature store by Go-JEK, Feast has been taken on by Tecton to be a minimal, configurable feature store. You can connect in different online/offline data stores and it can run on any platform. Feature engineering is done outside of Feast.

Company:

Linux Foundation

Feast Documentation

Open source

Tecton

Updated on:

August 11, 2023

Tecton.ai is a managed feature store that uses PySpark or SQL (Databricks or EMR) or Snowflake to compute features and DynamoDB to serve online features. It provides a Python-based DSL for orchestration and feature transformations that are computed as a PySpark job. Available on AWS.

Company:

Tecton

Tecton Documentation

Vendor

Jukebox

Updated on:

August 11, 2023

Spotify built their own ML platform that leverages TensorFlow Extended (TFX) and Kubeflow. They focus on designing and analyzing their ML experiments instead of building and maintaining their own infrastructure, resulting in faster time from prototyping to production.

Company:

Spotify

Spotify Engineering: Machine Learning Infrastructure Through TensorFlow Extended and Kubeflow

In-house

Intuit

Updated on:

August 11, 2023

Intuit have built a feature store as part of their data science platform. It was developed for AWS and uses S3 and Dynamo as its offline/online feature serving layers.

Company:

Intuit

Intuit Engineering: A Data Journey

In-house

Iguazio

Updated on:

August 11, 2023

A centralized and versioned feature storre built around their MLRun open-source MLOps orchestration framework for ML model management. Uses V3IO as it offline and online feature stores.

Company:

Iguazio

Iguazio Documentation

Vendor

On-Prem

Open source

Abacus.ai

Updated on:

August 11, 2023

The platform allows to build real-time machine and deep learning features, upload ipython notebooks, monitor model drift, and set up CI/CD for machine learning systems.

Company:

Abacus.ai

Abacus Publications

Vendor

Doordash

Updated on:

August 11, 2023

A ML Platform with an effective online prediction ecosystem. It serves traffic on a large number of ML Models, including ensemble models, through their Sibyl Prediction Service.They extended Redis with sharding and compression to work as their online feature store.

Company:

Doordash

Doordash: Building a Gigascale ML Feature Store with Redis

In-house

Salesforce

Updated on:

August 11, 2023

ML Lake is a shared service that provides the right data, optimizes the right access patterns, and alleviates the machine learning application developer from having to manage data pipelines, storage, security and compliance. Built on an early version of Feast based around Spark.

Company:

Salesforce

Salesforce: AI Technology and Resources

In-house

H2O Feature Store

Updated on:

October 3, 2023

H2O.ai and AT&T co-created the H2O AI Feature Store to store, update, and share the features data scientists, developers, and engineers need to build AI models.

Company:

H2O.ai and AT&T

H2O.ai Documentation

Vendor

On-Prem

Feathr

Updated on:

August 11, 2023

Feathr automatically computes your feature values and joins them to your training data, using point-in-time-correct semantics to avoid data leakage, and supports materializing and deploying your features for use online in production.

Company:

Microsoft / Linkedin

Feathr's Documentation

Open source

Nexus

Updated on:

August 11, 2023

Nexus supports batch, near real-time, and real-time feature computation and has global scale for serving online and offline features from Redis and Delta Lake-s3, respectively.

Company:

Disney Streaming

Nexus Feature Store Details

In-house

Qwak

Updated on:

August 11, 2023

Qwak's feature store is a component of the Qwak ML platform, providing transformations, a feature registry and both an online and offline store.

Company:

Qwak

Qwak Product Details

Vendor

Beast

Updated on:

August 11, 2023

Robinhood built their own event-based real-time feature store based on Kafka and Flink.

Company:

Robinhood

Robinhood Engineering Blog

In-house

Feature Byte

Updated on:

August 11, 2023

FeatureByte is a solution that aims to simplify the process of preparing and managing data for AI models, making it easier for organizations to scale their AI efforts.

Company:

Feature Byte

Feature Byte Documentation

Vendor

Fennel

Updated on:

August 14, 2023

Fennel is a fully managed realtime feature platform from an-Facebook team. Powered by Rust, it is built ground up to be easy to use. It works natively with Python/Pandas, has beautiful APIs, installs in your cloud in minutes, and has best-in-class support for data/feature quality.

Company:

Fennel

Fennel Documentation

Vendor

OpenMLDB

Updated on:

September 21, 2023

An open-source feature computing platform that offers unified SQL APIs and a shared execution plan generator for both offline and online engines, eliminating the need for cumbersome transformation and consistency verification.

Company:

4Paradigm

Documentation

Open source

Platform

Open Source

Offline

Online

Real Time Ingestion

Feature Ingestion API

Write Amplification

Supported Platforms

Training API

Training Data

Hopsworks

AGPL-V3

Hudi/Hive and pluggable

RonDB

Flink, Spark Streaming

(Py)Spark, Python, SQL, Flink

AWS, GCP, On-Prem

Spark

DataFrame (Spark or Pandas), files (.csv, .tfrecord, etc)

Michelangelo Palette

Hive

Cassandra

Flink, Spark Streaming

Spark, DSL

None

Proprietary

Spark

DataFrame (Pandas)

Chronon

Hive

Unknown KV Store

Flink

DSL

None

Proprietary

Spark

Streamed to models?

Twitter

GCS

Manhatten, Cockroach

Unknown

Python, BigQuery

Yes. Ingestion Jobs

Proprietary

BigQuery

DataFrame (Pandas)

Iguazio

Parquet

V3IO, proprietary DB

Nuclio

Spark, Python, Nuclio

Unknown

AWS, Azure, GCP, on-prem

No details

DataFrame (Pandas)

Databricks

Delta Lake

Mysql or Aurora

None

Spark, SparkSQL

Unknown

Spark

Spark Dataframes

SageMaker

S3, Parquet

DynamoDB

None

Python

Yes. Ingestion Jobs

AWS

Aurora

DataFrame (Pandas)

Featureform

Mozilla

Pluggable

None

Spark

AWS, Azure, GCP

Spark

DataFrame (Spark, Pandas)

Jukebox

BigQuery

BigTable

Scio

Scio, BigQuery

Yes. Ingestion Jobs

Proprietary

BigQuery

DataFrame (Pandas)

Doordash

Snowflake

Redis

Flink

Unknown

Proprietary

Snowflake

DataFrame (Pandas)

Salesforce

S3, Iceberg

DynamoDB

Unknown

Yes. Ingestion Jobs

Proprietary

Iceberg

DataFrame (Pandas)

Intuit

GraphQL API, unknown backend

Beam

Spark, Beam

Unknown

Proprietary

Unknown

DataFrame (Pandas)

OLX

Kafka

KSQLdb

Proprietary

KSQLdb

From feature logging

Continual

Snowflake

Coming soon

DBT

Snowflake, more coming

Snowflake

Proprietary

Metarank

Yes

N/A

Redis

Flink

YAML-based

Open-Source

XGBoost, LightGBM

CSV files?

Scribble Enrich

Pluggable

None

Python

AWS, Azure, GCP, on-prem

No details

DataFrame (Pandas)

Feathr

Yes

Pluggable

None

Spark

Azure, AWS

Spark

DataFrames, files (.csv, .parquet, etc)

Nexus

Delta Lake

Redis

Spark Streaming

Spark

Unknown

Spark

Proprietary

PLATFORM - CATEGORIES

DETAILS

Hopsworks

Featureform

Metarank

Open - Source

Yes

Feature Ingestion API

YAML-based

Supported Platforms

Open-Source

Scribble Enrich

Open - Source

Feature Ingestion API

Python

Supported Platforms

AWS, Azure, GCP, on-prem

Feathr

Open - Source

Yes

Feature Ingestion API

Spark

Supported Platforms

Azure, AWS

Nexus

Open - Source

Feature Ingestion API

Spark

Supported Platforms

Unknown

Feature Store Summit 2023

View Recordings

Feature Stores for ML

Join the community

Meetups

FEATURED BLOG POSTS

Streamlining Machine Learning Development with a Feature Store

Feature pipelines and feature stores — deep dive into system engineering and analytical tradeoffs

Make a prediction every day with Serverless Machine Learning

Read

new Feature StoreAdvanced Guide 2024

Hopsworks

Vertex AI

SageMaker

Databricks

Chronon

Michelangelo

FBLearner

Overton

Featureform

Twitter

Feast

Tecton

Jukebox

Intuit

Iguazio

Abacus.ai

Doordash

Salesforce

H2O Feature Store

Feathr

Nexus

Qwak

Beast

Feature Byte

Fennel

OpenMLDB

FEATURE STORE COMPARISON

Platform

Open Source

Offline

Online

Real Time Ingestion

Feature Ingestion API

Write Amplification

Supported Platforms

Training API

Training Data

Hopsworks

Michelangelo Palette

Chronon

Twitter

Iguazio

Databricks

SageMaker

Featureform

Jukebox

Doordash

Salesforce

Intuit

OLX

Continual

Metarank

Scribble Enrich

Feathr

Nexus

Hopsworks

Open - Source

Feature Ingestion API

Supported Platforms

Michelangelo Palette

Open - Source

Feature Ingestion API

Supported Platforms

Chronon

Open - Source

Feature Ingestion API

Supported Platforms

Twitter

Open - Source

new Feature Store
Advanced Guide 2024