By clicking “Accept All Cookies”, you agree to the storing of cookies on your device to enhance site navigation, analyze site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

Feature Store Summit 2024

View Sessions

Feature Stores for ML

FEATURED BLOG POSTS

Have an interesting blog idea?

Read

new Feature Store
Advanced Guide 2024

Hopsworks

Hopsworks

Updated on:
October 18, 2024

The first open-source Feature Store and the first with a DataFrame API. Most data sources (batch/streaming) supported. Ingest features using SQL, Spark, Python, Flink. The only feature store supporting stream processing for writes. Available as managed platform and on-premises.

Company:
Hopsworks
Vendor
Open source
On-Prem
Vertex AI

Vertex AI

Updated on:
August 11, 2023

A centralized repository for organizing, storing, and serving ML features on the GCP Vertex platform. Vertex AI Feature Store supports BigQuery, GCS as data sources. Separate ingestion jobs after feature engineering in BigQuery. Offline is BigQuery, Online BigTable.

Company:
Google
Vendor
SageMaker

SageMaker

Updated on:
August 11, 2023

Sagemaker Feature Store integrates with other AWS services like Redshift, S3 as data sources and Sagemaker serving. It has a feature registry UI in Sagemaker, and Python/SQL APIs. Online FS is Dynamo, offline parquet/S3.

Company:
Amazon
Vendor
Databricks

Databricks

Updated on:
August 11, 2023

A Feature Store built around Spark Dataframes. Supports Spark/SQL for feature engineering with a UI in Databricks. Online FS is AWS RDS/MYSQL/Aurora. Offline is Delta Lake.

Company:
Databricks
Vendor
Chronon

Chronon

Updated on:
October 18, 2024

One of the first feature stores from 2018, orginially called Zipline and part of the Bighead ML Platform. Feature engineering is using a DSL that includes point-in-time correct training set backfills, scheduled updates, feature visualizations and automatic data quality monitoring.

Company:
AirBnB
In-house
Michelangelo

Michelangelo

Updated on:
October 18, 2024

The mother of feature stores. Michelangelo is an end-to-end ML platfom and Palette is the features store. Features are defined in a DSL that translates into Spark and Flink jobs. Online FS is Redis/Cassandra. Offline is Hive.

Company:
Uber
In-house
FBLearner

FBLearner

Updated on:
August 11, 2023

Internal end-to-end ML Facebook platform that includes a feature store. It provides innovative functionality, like automatic generation of UI experiences from pipeline definitions and automatic parallelization of Python code.

Company:
Facebook
In-house
Overton

Overton

Updated on:
August 11, 2023

Internal end-to-end ML platform at Apple. It automates the life cycle of model construction, deployment, and monitoring by providing a set of novel high-level, declarative abstractions. It has been used in production to support multiple applications in both near-real-time applications and back-of-house processing.

Company:
Apple
In-house
Featureform

Featureform

Updated on:
August 11, 2023

FeatureForm is a virtual feature store platfrom - you plug in your offline and online data stores. It supports Flink, Snowflake, Airflow Kafka, and other frameworks.

Company:
Featureform
Vendor
On-Prem
Open source
Twitter

Twitter

Updated on:
August 11, 2023

Twitter's first feature store was a set of shared feature libraries and metadata. Since then, they moved to building their own feature store, which they did by customizin feast for GCP.

Company:
Twitter
In-house
Feast

Feast

Updated on:
August 11, 2023

Originally developed as an open-source feature store by Go-JEK, Feast has been taken on by Tecton to be a minimal, configurable feature store. You can connect in different online/offline data stores and it can run on any platform. Feature engineering is done outside of Feast.

Company:
Linux Foundation
Open source
Tecton

Tecton

Updated on:
August 11, 2023

Tecton.ai is a managed feature store that uses PySpark or SQL (Databricks or EMR) or Snowflake to compute features and DynamoDB to serve online features. It provides a Python-based DSL for orchestration and feature transformations that are computed as a PySpark job. Available on AWS.

Company:
Tecton
Vendor
Jukebox

Jukebox

Updated on:
August 11, 2023

Spotify built their own ML platform that leverages TensorFlow Extended (TFX) and Kubeflow. They focus on designing and analyzing their ML experiments instead of building and maintaining their own infrastructure, resulting in faster time from prototyping to production.

Company:
Spotify
In-house
Intuit

Intuit

Updated on:
August 11, 2023

Intuit have built a feature store as part of their data science platform. It was developed for AWS and uses S3 and Dynamo as its offline/online feature serving layers.

Company:
Intuit
In-house
Iguazio

Iguazio

Updated on:
August 11, 2023

A centralized and versioned feature storre built around their MLRun open-source MLOps orchestration framework for ML model management. Uses V3IO as it offline and online feature stores.

Company:
Iguazio
Vendor
On-Prem
Open source
Abacus.ai

Abacus.ai

Updated on:
August 11, 2023

The platform allows to build real-time machine and deep learning features, upload ipython notebooks, monitor model drift, and set up CI/CD for machine learning systems.

Company:
Abacus.ai
Vendor
Doordash

Doordash

Updated on:
August 11, 2023

A ML Platform with an effective online prediction ecosystem. It serves traffic on a large number of ML Models, including ensemble models, through their Sibyl Prediction Service.They extended Redis with sharding and compression to work as their online feature store.

Company:
Doordash
In-house
Salesforce

Salesforce

Updated on:
August 11, 2023

ML Lake is a shared service that provides the right data, optimizes the right access patterns, and alleviates the machine learning application developer from having to manage data pipelines, storage, security and compliance. Built on an early version of Feast based around Spark.

Company:
Salesforce
In-house
H2O Feature Store

H2O Feature Store

Updated on:
October 3, 2023

H2O.ai and AT&T co-created the H2O AI Feature Store to store, update, and share the features data scientists, developers, and engineers need to build AI models.

Company:
H2O.ai and AT&T
Vendor
On-Prem
Feathr

Feathr

Updated on:
August 11, 2023

Feathr automatically computes your feature values and joins them to your training data, using point-in-time-correct semantics to avoid data leakage, and supports materializing and deploying your features for use online in production.

Company:
Microsoft / Linkedin
Open source
Nexus

Nexus

Updated on:
August 11, 2023

Nexus supports batch, near real-time, and real-time feature computation and has global scale for serving online and offline features from Redis and Delta Lake-s3, respectively.

Company:
Disney Streaming
In-house
Qwak

Qwak

Updated on:
August 11, 2023

Qwak's feature store is a component of the Qwak ML platform, providing transformations, a feature registry and both an online and offline store.

Company:
Qwak
Vendor
Beast

Beast

Updated on:
August 11, 2023

Robinhood built their own event-based real-time feature store based on Kafka and Flink.

Company:
Robinhood
In-house
Feature Byte

Feature Byte

Updated on:
August 11, 2023

FeatureByte is a solution that aims to simplify the process of preparing and managing data for AI models, making it easier for organizations to scale their AI efforts.

Company:
Feature Byte
Vendor
Fennel

Fennel

Updated on:
October 18, 2024

Fennel is a fully managed realtime feature platform from an-Facebook team. Powered by Rust, it is built ground up to be easy to use. It works natively with Python/Pandas, has beautiful APIs, installs in your cloud in minutes, and has best-in-class support for data/feature quality.

Company:
Fennel
Vendor
OpenMLDB

OpenMLDB

Updated on:
September 21, 2023

An open-source feature computing platform that offers unified SQL APIs and a shared execution plan generator for both offline and online engines, eliminating the need for cumbersome transformation and consistency verification.

Company:
4Paradigm
Open source
Chalk

Chalk

Updated on:
October 18, 2024

Chalk is a platform for building real-time ML applications that includes a real-time feature store.

Company:
Chalk AI
Vendor

FEATURE STORE COMPARISON

Platform

Open Source

Offline

Online

Real Time Ingestion

Feature Ingestion API

Write Amplification

Supported Platforms

Training API

Training Data

Hopsworks

AGPL-V3

Hudi/Hive and pluggable

RonDB

No

AWS, GCP, On-Prem

Spark

DataFrame (Spark or Pandas), files (.csv, .tfrecord, etc)

Michelangelo Palette

No

Hive

Cassandra

None

Proprietary

Spark

DataFrame (Pandas)

Chronon

No

Hive

Unknown KV Store

None

Proprietary

Spark

Streamed to models?

Twitter

No

GCS

Manhatten, Cockroach

Yes. Ingestion Jobs

Proprietary

BigQuery

DataFrame (Pandas)

Iguazio

No

Parquet

V3IO, proprietary DB

Unknown

AWS, Azure, GCP, on-prem

No details

DataFrame (Pandas)

Databricks

No

Delta Lake

Mysql or Aurora

Unknown

Unknown

Spark

Spark Dataframes

SageMaker

No

S3, Parquet

DynamoDB

Yes. Ingestion Jobs

AWS

Aurora

DataFrame (Pandas)

Featureform

Mozilla

Pluggable

Pluggable

No

AWS, Azure, GCP

Spark

DataFrame (Spark, Pandas)

Jukebox

No

BigQuery

BigTable

Yes. Ingestion Jobs

Proprietary

BigQuery

DataFrame (Pandas)

Doordash

No

Snowflake

Redis

Unknown

Proprietary

Snowflake

DataFrame (Pandas)

Salesforce

No

S3, Iceberg

DynamoDB

Yes. Ingestion Jobs

Proprietary

Iceberg

DataFrame (Pandas)

Intuit

No

S3

GraphQL API, unknown backend

Unknown

Proprietary

Unknown

DataFrame (Pandas)

OLX

No

Kafka

Kafka

No

Proprietary

KSQLdb

From feature logging

Continual

No

Snowflake

Coming soon

No

Snowflake, more coming

Snowflake

Proprietary

Metarank

Yes

N/A

Redis

No

Open-Source

XGBoost, LightGBM

CSV files?

Scribble Enrich

No

Pluggable

Pluggable

No

AWS, Azure, GCP, on-prem

No details

DataFrame (Pandas)

Feathr

Yes

Pluggable

Pluggable

No

Azure, AWS

Spark

DataFrames, files (.csv, .parquet, etc)

Nexus

No

Delta Lake

Redis

Unknown

Unknown

Spark

Proprietary

PLATFORM - CATEGORIES
DETAILS

Hopsworks

Open - Source

AGPL-V3

Feature Ingestion API

Supported Platforms

AWS, GCP, On-Prem

Michelangelo Palette

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Chronon

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Twitter

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Iguazio

Open - Source

No

Feature Ingestion API

Supported Platforms

AWS, Azure, GCP, on-prem

Databricks

Open - Source

No

Feature Ingestion API

Supported Platforms

Unknown

SageMaker

Open - Source

No

Feature Ingestion API

Supported Platforms

AWS

Featureform

Open - Source

Mozilla

Feature Ingestion API

Supported Platforms

AWS, Azure, GCP

Jukebox

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Doordash

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Salesforce

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Intuit

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

OLX

Open - Source

No

Feature Ingestion API

Supported Platforms

Proprietary

Continual

Open - Source

No

Feature Ingestion API

Supported Platforms

Snowflake, more coming

Metarank

Open - Source

Yes

Feature Ingestion API

Supported Platforms

Open-Source

Scribble Enrich

Open - Source

No

Feature Ingestion API

Supported Platforms

AWS, Azure, GCP, on-prem

Feathr

Open - Source

Yes

Feature Ingestion API

Supported Platforms

Azure, AWS

Nexus

Open - Source

No

Feature Ingestion API

Supported Platforms

Unknown

Feature Ingestion API: What APIs and languages are supported for writing features to the feature store?
Write Amplification: Do you write your features more than once - .e.g, write to stable storage first, then run a separate job to ingest features?
Training API (PIT Join Engine): When you create training data from reusable features, you need to join the feature values together. What compute engine is used to perform this point-in-time JOIN?
Training Data: How is the training data made available to machine learning frameworks? As dataframes or files?

Build or Buy

Developing a software can be extremely costly and time-consuming so reusability of different systems proves to be a reasonable solution, however the number of companies building their own feature store is on the rise.

Learn more about the industry's conundrum by watching the relevant panel discussion from the first Feature Store Summit.

Build or Buy

Developing a software can be extremely costly and time-consuming so reusability of different systems proves to be a reasonable solution, however the number of companies building their own feature store is on the rise.

Learn more about the industry's conundrum by watching the relevant panel discussion from the latest Feature Store Summit.