The Technical Side of Embedding Video Calls into Telehealth Apps

by Sophia Turol, Alex Khizhniak, and Jon CapistranoMay 1, 2021

With telemedicine supporting 70% of consultations during the pandemic, implementing virtual care may require additional effort from medical institutions.

Table of Contents

A disrupting effect of COVID-19 on healthcare

The worldwide healthcare system was not designed to deal with the COVID-19 crisis that emerged in 2020. It is still a large-scale health challenge that requires the mobilization of all the available resources. Due to the outbreak, health services and treatments were disrupted. According to the World Health Organization (WHO), in 94% of the countries, most medical and health professionals were reassigned to cope with COVID-19 and support treatment.

The care for chronic disease patients, especially the elderly, disabled, and people with reduced capabilities, was also interrupted due to the lack of medical staff and a decrease in public transport traffic. A study by Fight Cancer reveals that 51% of participants experienced consultation cancellations, while 25% noted more than two weeks of treatment delay.

As a response to social distancing requirements, as well as the lack of healthcare resources, video conferencing tools skyrocketed. In particular, the number of Zoom users increased from 10 million daily in December 2019 to 300 million at the peak of the pandemic in April 2020. A study by McKinsey revealed that overall telehealth visits increased by more than 50–175 times in 2020. According to the CDC, 69% of patients used telemedicine platforms during the initial pandemic period.

The shift in virtual care during COVID-19 (Image credit)

What were the main drivers behind telemedicine adoption by patients? Medical Economics reports online scheduling (47%), immediate appointment availability (47%), and other convenience factors.

Across all the scenarios, video conferencing is a central part of the communication between a patient and a doctor. In this article, we explore the technical challenges associated with adding video components to a telehealth system and how to address these issues from an engineering perspective.

Architectural approaches to video conferencing

When enabling video conferencing to serve an increasing number of patients, it’s vital to think out a scalable architecture and build upon a technology stack that ensures high-quality video/audio, secure storage, management of calls, etc.

In the case of bulding this kind of architecture upon the AWS stack, vendors such as Amazon offer numerous options tailored to healthcare and life science projects. Various modules may include functionality for remote real-time health tracking, secure patient data storage, event management, audio/video transcription, dashboards, calls, etc.

A sample telemedicine workflow suggested by Amazon (Image credit)

For instance, in this example (above), raw audio/video files could be stored in an S3 bucket, hosting patient data inside DynamoDB. In the meantime, the files could be converted into text transcriptions, so that doctors will not have to listen through the whole video call. To further facilitate the diagnostics, an institution implementing the system could analyze transcripts using Amazon Comprehend Medical and automatically extract patient conditions, medications, treatments, etc. Elasticsearch and Quicksight dashboards will help physicians to search and retrieve necessary data and generate reports.

Similar architectures are offered by almost all major vendors, such as Microsoft, Google, and others. A video conferencing platform will take a core place in any of the suggested architectural solutions as a primary means of communication between a doctor and a patient. Numerous platforms are available for integration via APIs (such as Zoom, Skype, Microsoft Teams, Google Meet, etc.) or even as custom solutions adjusted to the needs of healthcare. (Check out this Teams docs or Zoom for Healthcare, for instance.)

Common technical challenges to address

With all the technological advancements, there are still some challenges to address when implementing such architectures.

For video streaming, performance and high availability are an absolute necessity, as well as scalability to serve thousands of patients in real time. Downtimes have immediate impact on care delivery with doctors unable to diagnose, as well as assign and continue treatment.
Security has to be ensured across infrastructure, platform, and app layers. This will help to prevent malicious attacks and data manipulations, as well as restrict access for unauthorized parties. A notorious example of a security breach happened to Zoom back in April 2020—hackers got a hold of 500,000 passwords and e-mail addresses.
Since personal data is shared during video sessions, compliance with the industry regulations is a must. In the US, the Health Insurance Portability and Accountability Act (HIPAA) and the Health Information Technology for Economic and Clinical Health Act (HITECH) are the regulations stipulating the measures to put in place. In the European Union, the General Data Protection Regulation (GDPR) serves the purpose.
Integration with existing infrastructure and processes can become a hurdle. For instance, updating an electronic heath record (EHR) with new patient information revealed during a video session or securely sharing this data with third parties will require building bridges between apps. A typical healthcare organization can have hundreds of those.

Diagnostic collaboration between doctors using video conferencing (Image credit)

Healthcare institutions may want to store a video recording of each conducted virtual session, so doctors can easily listen to the patient’s complaints or symptoms again, when it is needed. This means, the system may need capacity for hundreds terabytes or even petabytes of data, as well as all the security and compliance mechanisms enabled. It is also important to prevent any conflict coming from different video file formats supported by different platforms, such as .webm, .wmv, .flv, .ogv, etc.
Along with the recordings may come the necessity to deliver and process transcripts for doctors to have a quick look at or search through text-based information. Manual transcription is inefficient and time-consuming, especially amid the pandemic. At the same time, complex medical terminology and precise diagnostics do not leave room for misinterpretation or error in case of automated transcription. Due to the confidentiality of the information featured in the transcripts, yet again, data security is on the agenda.
Customization may bring along discrepancy across patches and upgrades. Laying foundation for future maintenance will save a lot of effort and dramatically reduce development costs.

So, how to address everything on the list?

Best practices to consider

Enabling high availability and scalability

Choosing appropriate architectural approaches and optimal technology stack is halfway to ensuring high availability of the system. One may consider a microservices-based architecture, which allows for building and maintaining a system as a set of independent components. While these components can be written in different programming languages once it’s necessary, the microservices model prevents your from a single point of failure, helping to achieve near-to-zero downtime during updates, deployments, etc.

Either developing a video conferencing platform from scratch or integrating a ready-made solution, it’s wise to invest in networking that will sustain peak traffic loads and properly distribute resources. Enabling system redundancy or backups provides excess capacity to accommodate high throughput, prevent performance decline, and ensure availability. Multi-AZ infrastructures/data centers and content delivery networks (CDN) will help to minimize response times.

Enhancing security and achieving compliance

It would not be an exaggeration to say that almost all top vendors—including Amazon, Google, Microsoft, etc.—have HIPAA-compliant cloud offerings, which also aim for achieving high availability and scalability. These offerings embrace a variety of measures to comply with the required standards (e.g., identity and access management, virtual private networks, SSH, logging/monitoring, end-to-end data encryption, role-based access control, private clouds, etc.).

Securing access to web servers for better HIPAA compliance (Image credit)

If working with EU citizens’ data, medical organizations should comply with the GDPR—in this case, they may also need to change the processes of collecting, storing, processing, and sharing data.

Regardless of the market served, one of the simplest things to consider when implementing a solution for end users is establishing multi-factor authentication (MFA). Passwords are easy targets for hackers, and MFA requires at least two pieces of evidence to authenticate the identity of a user before providing access to the system. Biometric tools, such as finger and retina scanners, can also be used.

Health professionals should also follow strict rules in saving documents to personal devices (smartphones, tablets, or USBs) and sending confidential documents via personal e-mail. Healthcare facilities can consider separating patient populations according to their medical needs (e.g., chronic and COVID-19 cases), as well as provision the appropriate remote diagnostics and equipment.

It’s worth investing time into creating privacy policies and training programs for those involved in both operating and using the telehealth system. Regular audits of hardware and software can also contribute to improving security—having said that, these evaluations should be implemented as recurrent strategical activities, rather that one-time procedures.

Facilitating integration

Most video conferencing platforms, such as Skype or Zoom, would have an API to integrate with in-house systems and workflows. In some cases, it may take additional tweaking to enable secure data sharing or compliance with regulations. On the market, there is a variety of APIs for the purpose, already compatible with the FHIR, HL7, and DICOM standards. To name a few, take a look at Azure API for FHIR by Microsoft, Cloud Healthcare API by Google, or AWS Health API by Amazon.

A conceptual health architecture by Microsoft (Image credit)

In the architecture above, a telemedicine system can contain a video conferencing module implemented with Microsoft Teams—supporting video, voice, and chat tooling—integrated with the EHR Connector.

In the course of integrations, consider building ETL pipelines to prevent duplicates and outdated information, as well make necessary transformations over data.

Storing the calls and providing transcripts

To ensure scalable storage of ever growing video data, one may opt for cloud systems like AWS S3, Google Cloud Storage, or Microsoft Azure Storage. If the amount of data is enormous and regular cloud providers have issues storing information with required latencies, explore NoSQL databases (which can be provided “as a service”) or distributed systems, such as HDFS.

To achieve better precision of calls transcription, you may utilize machine learning libraries and tools. Neural networks are the main drivers behind speech recognition and its conversion to text (speech-to-text). However, training a neural network is far from being a trivial task, requiring sufficient data sets, experience with AI tools (like TensorFlow or Keras), fine-tuning, etc. Yet, you can have a look at the out-of-the-box solutions, such as Amazon Transcribe Medical, the Speech SDK by Microsoft, or Speech-to-Text API by Google.

The workflow behind Google’s video transcription model (Image credit)

In April 2021, Microsoft cut a deal to acquire Nuance, a provider of a clinical voice-to-text platform. This will facilitate transcriptions of video recording for the healthcare experts using Microsoft software (Office 365, Teams, etc.). Nuance claims that their products are adopted by more than 55% of physicians and 75% of radiologists in 77% of US hospitals.

Accelerating customization

By building continuous integration/delivery (CI/CD) pipelines, it is possible to ensure zero-downtime upgrades. Tools like Jenkins or Concourse implemented into a CI/CD pipeline will help to automate deployments. Kubernetes distributions can improve stability and automate operations, enabling faster customer feature feedback and frequent deployments. (E.g., Airbnb deploys 125,000 times per year with multicluster Kubernetes.) The blue-green deployment method can safeguard a painless roll-back and reduces downtime for the server.

The success of virtual care?

To sum up, introducing video conferencing into healthcare workflows should not be a goal in itself—it has to be a part of an overall business, technology, and treatment strategy. A 2020 study by McKinsey outlines some additional recommendations to ensure the success of future virtual care implementations:

Defining a virtual health roadmap with a data-driven view
Delivering user-centric experience—e.g., digital front-door features
Developing scenarios for virtual care beyond emergencies like the pandemic

McKinsey predicts that ~20% of all emergency visits and 24% of healthcare office visits could be delivered virtually in the post-COVID era, with an additional 9% “near-virtually.” The analysts also think that “up to 35% of regular home health attendant services could be virtualized, and 2% of all outpatient volume could be shifted to the home setting, with tech-enabled medication administration.” In its turn, EY provides similar estimates, and video conferencing will be an integral part of all these solutions.

The percentage of in-person visits to a doctor that can be virtualized (Image credit)

The adoption of video conferencing across healthcare scenarios will depend on the success of existing telemedicine implementations. In case they are justified by a positive ROI and improved efficiencies, more institutions will allocate budgets for investing into virtual care. However, at the moment, it looks like they will. Deloitte reports that the average patient waiting times were down by 50% in New York City and 75% in San Francisco—thanks to implementing virtual consultations—which is huge. As a result, the analyst predicts 400 million video visits to doctors in 2021 globally.

From the technological perspective, the adoption of video conferencing in healthcare will also be driven by further evolution of 5G—due to faster networks, improved Internet connections, and higher frequency radio bands. (Based on Ericsson’s Mobility Report, 5G is projected to cover 65% of the worldwide population by the year 2025.) This may significantly change the current telehealth lanscape in five years, too—just like COVID-19 did in 2020.

Download our research paper to learn what else caregivers can do to improve operational efficiency with telemedicine and how to address the technical challenges on the way.