TFM4MedIA

Transferring Foundation Models for Multi-center Real-World Medical Image Analysis

Motivation

While foundation models like the Segment Anything Model (SAM) have demonstrated efficacy in various medical image analysis tasks, their performance on real-world data remains underexplored. Specifically, these models are typically trained for normal and large targets such as the liver and lungs, yet real-world data often originates from different centers with diverse modalities. Furthermore, the application of foundation models to complicated Regions of Interest (ROIs), like lesions or scars, poses additional challenges due to their small size and irregular shapes. Hence, developing effective and efficient transfer learning approaches to fully utilize those foundation models for real world medical image segmentation is of great values.

Task

In this track, we encourage participants to design effective transfer learning approaches to exploit the knowledge from existed foundation models efficiently and make them more effective in four segmentation tasks, including Myocardial Pathology Segmentation in MyoPS++ track, Liver Segmentation in LiQA track, Whole Heart Segmentation in WHS++ track and Left Atrial and Scar Segmentation in LAScarQS++ track.

Note: To address this task, participants are encouraged to leverage external data.

Data

Multi-center datasets are provided for four sub-tasks. More detailed data information can be found here for Myocardial Pathology Segmentation, Liver Segmentation, Whole Heart Segmentation and Left Atrial and Scar Segmentation.

Training Dataset

1). Myocardial Pathology Segmentation

Center	Num. patients	Sequences	Manual labels
1	81	LGE	Scar, left ventricle and myocardium
2	50	LGE, T2 and bSSFP	Scar, edema, left ventricle, myocardium and right ventricle
3	45	LGE, T2 and bSSFP	Scar, edema, left ventricle, myocardium and right ventricle
5	07	LGE and bSSFP	Scar, left ventricle, myocardium and and right ventricle
6	09	LGE and bSSFP	Scar, left ventricle, myocardium and and right ventricle
7	08	LGE and bSSFP	Scar, left ventricle, myocardium and and right ventricle

2). Liver Segmentation

Vendor	Center	Num. studies	Num. Annotations
A	A	100	10
B	B1	100	10
B	B2	50	10

3). Whole Heart Segmentation

Center	Num. patients	Modalities
A	20	CT
B	20	CT
C/D	20	MRI
E	26	MRI

4). Left Atrial and Scar Segmentation

Center	Modality	Num. task1	Num. task2
A	LGE MRI	60	130

Validation Dataset

1). Myocardial Pathology Segmentation

Center	Num. patients	Sequences	Manual labels
4	25	LGE, T2 and bSSFP	Scar, edema, left ventricle, myocardium and right ventricle

2). Liver Segmentation

Vendor	Center	Num. studies	Num. Annotations
A	A	10	10
B	B1	10	10
B	B2	10	10

3). Whole Heart Segmentation

Center	Num. patients	Modalities
A	20	CT
B	10	CT
C/D	20	MRI

4). Left Atrial and Scar Segmentation

Center	Modality	Num. task1	Num. task2
A	LGE MRI	10	10
C	LGE MRI	0	10

Test Dataset

1). Myocardial Pathology Segmentation

Center	Num. patients	Sequences	Manual labels
4	25	LGE, T2 and bSSFP	Scar, edema, left ventricle, myocardium and right ventricle

2). Liver Segmentation

The 160 test cases corresponded to 120 new cases from the vendors provided in the training set and 40 additional cases from a third unseen center, that were tested for model generalizability.

Vendor	Center	Num. studies
A	A	40
B	B1	40
B	B2	40
C (new)	C	40

3). Whole Heart Segmentation

Center	Num. patients	Modalities
A	20	CT
B	14	CT
C/D	20	MRI
F	16	MRI

4). Left Atrial and Scar Segmentation

Center	Modality	Num. task1	Num. task2
A	LGE MRI	24	14
B	LGE MRI	0	20
C	LGE MRI	0	10

Prompts

Only points or bounding boxes are acceptable as prompts. Participants can generate prompts based on the segmentation ground truth by themselves for the training dataset.
For validation and testing testing, no more than 5 points and 1 bounding box are provided by organizers for each class as prompts. An example can be seen in the Figure 2.

Metrics & Ranking

Metrics

Dice Similarity Coefficient (DSC), Hausdorff Distance (HD).

Rank methods

The overall performance across all sub-tasks are considered for ranking: 1) Firstly, for each sub-task, the results will be ranked according to the Dice score on in-distribution (seen centre) and out-of-distribution (OOD, unseen centre) dataset, respectively. 2) Then the ranking results of all sub-tasks are averaged as the final rank.

This ranking approach encourage the participants to develop methods from foundation models to be consistently effective across all tasks as well as on OOD datasets.

Rules

Publicly available data and pretrained model are allowed.

Registration

Please register here to participate in the challenge and get access to the dataset!

Submission Guidance

Model Submission

After registration, we will assign participants an account to login into our TFM4MedIA evaluation platform. Participants can directly upload your predictions on the validation data (in nifty format) via the website. Note that evaluation of validation data will be allowed up to 10 times for each task per team. For fair comparison, the test dataset will remain unseen. Participants need to submit their docker models for testing.

Paper Submission

Please refer to our paper submission guidance.

Docker Submission (New!!!)

Since the aim of TM4MedIA is one model for all tasks, we make the name/directory/paths of a specific test task anonymous to avoid anyone using the information to select a specfiic model (e.g., nnUnet) for each test task. Therefore, we copy all test images of a specific task into the directory “:/input/images/” and save the corresponding prompts to the path “:/input/prompts.json”. For convenience, the json structure keeps the same as the validation phase. For example, the structure of “:/input” is

:/input
    |-- prompts.json
    |-- images
        |-- Case ID1.nii.gz
        |-- Case ID2.nii.gz
        ...
        |__ Case IDN.nii.gz

Note that for MyoPS++, one case has three sequences, i.e., “Case ID_C0.nii.gz”, “Case ID_LGE.nii.gz”, and “Case ID*_T2.nii.gz”.

The structure of “:/input/prompts.json” is

{
    "Case ID1":
    {
        "Slice Id1": null,               #set to null if no/tiny foreground 
        ...
        "Slice IDk": 
        {
            "Label ID1": 
            [
                [x, y, x + h, y + w],     #bounding box prompt
                [[x1, y1],[x2, y2], ...]  #point prompt, the number of points ranges in [1,5]
            ]
            ...
        }
        ...
    }
    ...
}

Orientations: when apply prompts to images, the orientation of loading images should be set to axcodes="PLS".

Your docker should read the input test images from the directory “:/input/images”, load the input prompts from the path “:/input/prompts.json”, and output the corresponding predictions into the directory “:/output”. The recommended structure of the directory “:/output” is

:/output
    |-- Case ID1_pred.nii.gz
    |-- Case ID2_pred.nii.gz
    ...
    |__ Case IDN_pred.nii.gz

The output segmentation should keep the same size as its original image/case. For the slices with null prompt, just label all areas as Label ID=0. For the slices with prompts, the label of segmentation should keep the same as the given label of prompts ("Label ID"). Note that inconsistent labels would lead to incorrect evaluation.

Timeline

The schedule for this track is as follows. All deadlines(DDLs) are on 23:59 in Pacific Standard Time.

Training Data Release	May 10, 2024
Validation Phase	~~June 10, 2024 to July 7, 2024 (DDL)~~ July 1, 2024 to September 25, 2024 (DDL)
Test Phase	~~July 7, 2024 to August 7, 2024 (DDL)~~ August 15, 2024 to September 15, 2024
Abstract Submission	~~July 15, 2024 (DDL) July 25, 2024 (DDL)~~ August 22, 2024 (DDL)
Paper Submission	~~August 15, 2024 (DDL)~~September 1, 2024 (DDL)
Notification	September 15, 2024
Camera Ready	September 25, 2024 (DDL)
Workshop (Half-Day)	October 10, 2024

Citations

Please cite these papers when you use the data for publications:

@article{Wu2023SemiSL,
  author={Wu, Fuping and Zhuang, Xiahai},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={Minimizing Estimated Risks on Unlabeled Data: A New Formulation for Semi-Supervised Medical Image Segmentation}, 
  year={2023},
  volume={45},
  number={5},
  pages={6021-6036},
}

@article{GAO2023BayeSeg,
  title = {BayeSeg: Bayesian modeling for medical image segmentation with interpretable generalizability},
  journal = {Medical Image Analysis},
  volume = {89},
  pages = {102889},
  year = {2023},
  author = {Shangqi Gao and Hangqi Zhou and Yibo Gao and Xiahai Zhuang},
}

Contact

If you have any problems about this track, please contact Dr. Fuping Wu or Dr. Shangqi Gao.