The current paper argues for synchronising spatial frames of reference for achieving effective multiparty communication in collaborative virtual environments. Synchronising nonverbal behaviour from different modalities is an important step for simulating face-to-face-interaction where all nonverbal cues are available. Such synchronisation also serves as an effective basis for building multimodal interfaces especially if these have to be deployed for multiparty communication. It is argued that common spatial reference frames are helpful in coordinating different points of attention and facilitating work by serving as the springboard for joint attention among members of the team. Consequently, it is desirable to aim for such common grounds and not just focus on coordinating disjointed virtual spaces for facilitating decision-making by reducing felt collaborative effort. Implementing the synchronisation of spatial reference frames for modern technologies thus serves dual purposes by achieving common grounds in communication and maintaining autonomy of each member at the same time. Towards this end, the current paper proposes the concept of decentred egocentric frame, the origin of which is one’s own body and the spatial relation between two objects is defined with respect to this origin. This frame seems to be important for separating each member’s focus from his own body (self/activities) and also helps in coordinating one’s focus with those of the others whether interacting verbally or nonverbally. This is an important conceptual development as the proposed classification is hypothesised to function in a similar manner across different sensory modalities. The paper concludes with issues on implementation and other future conceptual developments.