This study utilizes Sequence-based Transformer for Robust And Parcelised Training Of echo Recordings (STRAPTOR), an AI framework, employing self-supervised learning for automated semantic segmentation of echocardiogram sequences. Utilizing a Vision Transformer (ViT) trained within the self-DIstillation with NO labels (DINO) model for feature extraction, and a specialized Robust And Parcelised Training Of echo Recordings (RAPTOR) head for segmentation refinement, STRAPTOR demonstrates significant advancements in automating echocardiogram analysis. Our approach effectively segments the left ventricle by identifying and aggregating anatomically relevant subregions across cardiac phases. Validation on EchoNet-Dynamic and CAMUS datasets showcases STRAPTOR’s robustness and accuracy above 75%. This research highlights the potential of leveraging general datasets for domain-specific tasks, offering a scalable and interpretable solution for cardiac diagnostics.