Last edition:        
                    Sep 14, 2022



Before you can download the dataset, please read the Rules section. 

Download links:

HEROHE Grand Challenge 1-2

HEROHE Grand Challenge 2-2


This dataset is available for research purposes under the Attribution-NonCommercial-NoDerivs 3.0 Unported (CC BY-NC-ND 3.0)

Bellow, you can find a summary of (and not a substitute for) the license. (Disclaimer)

Under the license terms:

Attribution:  You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

NonCommercial: You may not use the material for commercial purposes.

NoDerivatives: If you remix, transform, or build upon the material, you may not distribute the modified material.

No additional restrictions: You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.

If you used this dataset or part of it, the corresponding publication must be cited:

  1. Conde-Sousa et al., HEROHE Challenge: Predicting HER2 Status in Breast Cancer from Hematoxylin–Eosin Whole-Slide Imaging, 2022, J. Imaging, ; 8(8):213; DOI: 10.3390/jimaging8080213

    Optionally, the following publication can also be cited:

    1. La Barbera, D., Polónia, A., Roitero, K., Conde-Sousa , E., Della Mea, V. Detection of HER2 from Haematoxylin-Eosin Slides Through a Cascade of Deep Learning Classifiers via Multi-Instance Learning (2020) J. Imaging, 6(9), 82; DOI: 10.3390/jimaging6090082

    Biological Information

    HER2 is a transmembrane protein receptor with tyrosine kinase activity and is amplified and/or overexpressed in approximately 15–20% of BC. The overexpression and/or amplification of HER2 has been associated with aggressive clinical behavior but with a high probability of response to HER2 targeted therapy. Consequently, the correct identification of HER2 positive BC selects patients expected to benefit from targeted therapy, making HER2 a helpful marker for therapy decision making in patients with BC.

    Ground Truth 

    The presented dataset contains 360 cases, 144 positives and 216 negatives, for training and 150 cases, 60 positive and 90 negative, for testing. The classification of this dataset used the latest American Society of Clinical Oncology/College of American Pathologists (ASCO/CAP) classification of breast cancer (Focused Update 2018). The ground truth can be accessed in the accompanying excel files.

    Data Format

    Whole slide images were scanned in a 3D Histech Pannoramic 1000 digital scanner and saved in the MIRAX format. Each whole slide image is stored divided into one .mrxs and a folder (with the same basename) containing many .dat files. 

    3D Histech provides two free software tools for viewing and exporting the files to commonly used formats, such as tiff or jpeg. We recommend participants to use CaseViewer, which is the latest (you should download the version with Converter to be able to export the files to other formats). Other alternatives are 3D Histech’s Panoramic Viewer, an older viewer that also provide export options, or other image analysis software that can open directly 3D Histech files such as QuPath, or Arivis. Participants are free to choose the software they like and if they want to convert the files to any other format beforehand.