A method and apparatus to insert variable audio delay during video conferencing to achieve conflicting goals of lip-sync and interactive conversation. In one embodiment, an apparatus includes a processor and memory including computer program code. Information is transmitted to an intermittently operating node by a handshaking protocol in which the intermittently operating node indicates that it is ready to receive.