TASKS
AVAILABLE TASKS
We provide three tasks that can be performed either independently or in conjunction with each other, namely “Surgical Procedure Phase Recognition”, “Surgical Instrument Keypoint Estimation”, and “Surgical Instrument Instance Segmentation”. An overview of these tasks is given below.
Please note:
It is not necessary to take part in all three tasks. Each task can be completed independently of the others, i.e., you can also take part in just one or two tasks.
Surgical Procedure Phase Recognition
For each individual frame of a video, we indicate the corresponding phase of the intervention. We therefore adopt the phases presented in the Cholec80 dataset [1], resulting in a total of seven different categories. We extend the annotation procedure described in [1] in that we explicitly annotate sequences between two phases, i.e., the transition from a previous to a subsequent phase, which is indicated, e.g., by the absence of instruments, and do not assign them to any of the seven predefined categories. Participants are free to deal with these phase transitions, but for the evaluation of the test data, the associated frames are not taken into account when calculating the metrics.
Please note: For the evaluation of the phase recognition task, only the frames from the past and the current frame can be used for the classification of the current frame, i.e., this is an online setting and you do not have access to future frames that have a higher frame number than the current frame that has to be classified.
The classification of the surgical phases is evaluated using the Balanced Accuracy (BA) as the multi-class counting metric and the F1-score as the harmonic mean of precision of recall as the per-class counting metric. For each metric, the ranking is determined individually, these are averaged, resulting in the final rank of a team.
The results are as follows:
Team | F1-score [%] | Rank | BA [%] | Rank | Averaged rank | Overall rank |
uniandes24 | 69.12 | 1. | 84.20 | 1. | 1.0 | 1. |
jmees_inc. | 65.46 | 2. | 82.20 | 2. | 2.0 | 2. |
yipingli | 58.01 | 3. | 79.02 | 4. | 3.5 | 3. |
smartlab_hkust | 57.34 | 4. | 80.31 | 3. | 3.5 | 3. |
ryze | 48.32 | 5. | 73.78 | 5. | 5.0 | 5. |
hanglok | 28.83 | 6. | 62.21 | 6. | 6.0 | 6. |
augi | 17.46 | 7. | 57.97 | 7. | 7.0 | 7. |
The award money is as follows:
Team | Final Ranking | Award Money |
uniandes24 | 1st Place | 500€ |
jmees_inc. | 2nd Place | 300€ |
yipingli | 3rd Place | 150€ |
smartlab_hkust | 3rd Place | 150€ |
Surgical Instrument Keypoint Estimation
At an interval of one frame per second, between two and four keypoints are annotated, depending on the type of instrument, which cover the relevant positions of an instrument. For instruments that consist of an instrument shaft and a two-part instrument tip, as is the case for example with scissors or the clip applicator, the four keypoints describe the end point (corresponds to the intersection of the instrument shaft with the edges of the image), the shaft point (corresponds to the transition between the instrument shaft and the tip), as well as the left clasper point and the right clasper point, which correspond to the two parts of the tip of the instrument.
For tools that cannot be opened and therefore only have one tip rather than a split tip, such as the coagulation probe, the number of keypoints is reduced to three, as there is only one annotation for the instrument tip.
Surgical tools without a distinction between shaft and tip, such as a palpation probe, also only have two keypoints, corresponding to the end point and the tip point of the instrument.
The evaluation of the keypoint accuracy is analogous to the calculation of the COCO mAP, whereby the object keypoint similarity (OKS) is used instead of the mIoU [4]. The OKS is calculated using the Euclidean distance between a predicted and a ground truth point, which is passed through an unnormalized Gaussian distribution where the standard deviation corresponds to the square root of the size of the segmentation area multiplied by a per-keypoint constant. We use the tuned version of the OKS proposed by COCO, which is based on a per-keypoint standard deviation with respect to the object scale and an adjusted constant. A more detailed description of OKS and the tuned version is given in [4]. For each metric, the ranking is determined individually, these are averaged, resulting in the final rank of a team.
The results are as follows:
Team | mAP_OKS | Rank |
sds-hd | 32.15 | 1. |
alvaro | 16.43 | 2. |
The award money is as follows:
Team |
Final Ranking | Award Money |
sds-hd | 1st Place | 500€ |
alvaro | 2nd Place | 300€ |
Surgical Instrument Instance Segmentation
For this task, we provide pixel-accurate instance segmentations of surgical instruments together with their instrument types for a total of 19 different instrument categories, also at an interval of one frame per second. The annotations consist of color-coded segmentation masks the individual instruments within a frame are separated by different colors. The red (R) and green (G) channels define the class of the instrument, and the blue (B) channel describes the instance of an object within an instrument class. For example, if there are two objects in a frame corresponding to one instrument class, the values of the R and G channels of both objects are identical, and the value of the B channel is different.
A more detailed description regarding the structure of the provided dataset and the annotations as well as the applied labeling instructions can be found in the respective document on the “DATA” page of our website: https://phakir.re-mic.de/data/.
For the instance segmentation of the surgical instruments, three metrics are employed. For localization, the dice-score as a multi-instance multi-class overlap metric is used, as a multi-threshold metric, the area under the precision-recall curve (AUC PR) is applied, and the 95% Hausdorff-Distance (95% HD) serves as the boundary-based metric. For the assignment strategy of predictions to ground truth segmentations, the Hungarian Maximum Matching Algorithm is utilized. For each metric, the ranking is determined individually, these are averaged, resulting in the final rank of a team.
The results are as follows:
Team | Dice [%] | Rank | AUC PR [%] | Rank | 95% HD | Rank | Averaged rank | Overall rank |
augi | 31.38 | 2. | 32.24 | 2. | 66.19 | 2. | 2.0 | 1. |
jmees inc. | 33.74 | 1. | 34.63 | 1. | 76.06 | 4. | 2.0 | 1. |
uniandes24 | 30.91 | 3. | 31.23 | 3. | 55.25 | 1. | 2.3 | 3. |
kist_harilab | 24.72 | 4. | 25.30 | 4. | 167.87 | 6. | 4.7 | 4. |
oluwatosin | 20.66 | 6. | 19.64 | 6. | 74.54 | 3. | 5.0 | 5. |
sk | 21.95 | 5. | 24.41 | 5. | 254.79 | 8. | 6.0 | 6. |
hanglok | 18.63 | 7. | 19.64 | 7. | 249.19 | 7. | 7.0 | 7. |
goncalo | 17.52 | 8. | 18.15 | 8. | 153.97 | 5. | 7.0 | 7. |
joaormanesco | 12.64 | 9. | 13.71 | 9. | 329.42 | 9. | 9.0 | 9. |
The award money is as follows:
Team | Final Ranking | Award Money |
augi | 1st Place | 450€ |
jmees_inc. | 1st Place | 450€ |
uniandes24 | 3rd Place | 200€ |
References:
[1] A. P. Twinanda, S. Shehata, D. Mutter, J. Marescaux, M. de Mathelin, and N. Padoy, “EndoNet: A Deep Architecture for Recognition Tasks on Laparoscopic Videos”, in IEEE Transactions on Medical Imaging, vol. 36, no. 1, pp. 86-97, 2017, doi: https://doi.org/10.1109/TMI.2016.2593957. [2] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár and C. L. Zitnick. “Microsoft COCO: Common Objects in Context”, in European Conference on Computer Vision, vol. 8693, 2014, doi: https://doi.org/10.1007/978-3-319-10602-1_48. [3]: Common Objects in Context – Detection Evaluation. https://cocodataset.org/detection-eval. Accessed: 14 November 2023. [4]: Common Objects in Context – Keypoint Evaluation. https://cocodataset.org/keypoints-eval. Accessed: 14 November 2023.