our foundation model

The industry-leading foundation model made by pathologists for pathologists

"RudolfV: A Foundation Model by Pathologists for Pathologists"

Jonas Dippel, Barbara Feulner, Tobias Winterhoff, Timo Milbich, Stephan Tietz, Simon Schallenberg, Gabriel Dernbach, Andreas Kunft, Simon Heinke, Marie-Lisa Eich, Julika Ribbat-Idel, Rosemarie Krupar, Philipp Anders,  Niklas Prenißl, Philipp Jurmeister, David Horst, Lukas Ruff, Klaus-Robert Müller, Frederick Klauschen,  Maximilian Alber

Our histopathology foundation model was developed using a curated “pathologist-in-the-loop” approach that increases model robustness. Our foundation model dramatically improves the scalability and performance of a variety of downstream tasks.

"Pathologist-in-the-loop” approach

Foundation models require large amounts of training data to allow them to be adaptable and versatile. Aignostics hypothesized that by structuring and curating our data sample using pathologist expertise, we would be able to develop a more efficient foundation model that was both robust and representative of the entire landscape.

Curated data: the training dataset for RudolfV consisted of data from labs across the EU and US

~

133k

slides

~

58

tissue types

~

1.2B

image patches

~

6

scanner types

~

129

staining types

~

15+

laboratories

~

34k

cases

~

FF & FFPE

preparations

Pathologist expertise & AI training: in-house pathologists and computational scientists collaborated to stratify images based on clinically relevant features prior to model training. The process we used was as follows:

Slide level grouping

Slides were assigned to one of 31 groups based on metadata such as lab of origin, tissue type, and staining modality. Assignments were made to maximize homogeneity within groups and heterogeneity across groups.

Tissue level clustering

Tissue patches were also assigned to one of 9 morphologically-similar clusters based on pathologist review (e.g., solid growth pattern, mononuclear infiltrates, etc.). These 9 clusters were composed of 1.2 billion image patches extracted across the training dataset.

Data sampling

Slide groups and tissue clusters were sampled equally during model training, which helped to downsample overrepresented visual features and upsample less common ones to better account for the diversity present within real-life pathology cases.

The image below shows results when a sample is pulled using our balanced approach vs random sampling. The balanced approach pulled samples containing 4 different staining types, with all patches containing relevant tissue (e.g. carcinoma, mucosa, lymphoid tissue, necrosis). The default approach pulled samples containing only 1 discernible stain (H&E) and areas with artifacts or no tissue besides carcinoma and mucosa tissue.

Balanced sampling based on tissue and slide clusters

Default option: random sampling

This example shows that the balanced approach results in a more diverse sample, while the random sample favors more common images with similar pathologies.

Our foundation model RudolfV was trained by adapting the DINOv2 framework to sample training data from a specific distribution derived from these slide groups and tissue clusters. This framework was chosen due to its proven performance and wide adoption, as well as to enable clear comparisons to other published models.

Applications

We have already integrated our foundation model into all of our histopathology work with clients. Check out our product page to learn more about the products and services we offer.

We are continually updating our model with more images and data modalities over time, and will publish regular updates on our progress and benchmarking results. Check back here for the latest version of our paper and results.

Below is an example of how our foundation model was applied to anomaly detection for rare diseases. Click below and fill out the form to see the full case study.

Interested in learning how our foundation model enhances the generalizability of a cell classification model? Click below and fill out the form to see the full case study.

Interested in learning how our foundation model improves model prediction performance and requires less training data to reach peak performance? Click below and fill out the form to see the full case study.

Questions & answers

What is a foundation model?

Trained on very large data sets, foundation models serve as a starting point for developing high performing machine learning models in a quick and cost-effective way.

Foundation models can be quickly fine-tuned for a wide range of downstream tasks and represent a major leap forward from traditional supervised learning approaches.

How was our foundation model developed?

Our foundation model was developed by having AI reconstruct masked image data. By doing so, AI learns to understand images and their context, e.g., that in a dense tumor the appearance of immune cells is less likely.

How does our model impact downstream tasks?

In internal analyses, our foundation model:

  • Reduces the amount of training data/annotations needed to optimize model performance by up to 90%

  • Increased the balanced accuracy of cell classification tasks by an average of ~10% across cell types 

  • Is robust across a wide range of scanners and stains

InquirIes

Interested in our foundation model? Reach out to learn more!

Contact us
Products and services

Discover our offerings to deliver novel insights for precision medicine.

Reach out below to learn more about how we transform drug development and improve patient outcomes.

See our Products