Artificial Intelligence will change every aspect of our lives. It will make it much better, and each technology will evolve. Television is no exception; first introduced over a hundred years ago, television will also evolve with AI. We have broken down this article with different angles on how AI will evolve televisions. First, we will discuss the AI and ML advancements, then content creation with AI, and lastly AI-powered actors.
AI and ML Advancements for Television
Artificial Intelligence (AI) is a program that uses a specific data array to teach itself a task, as well as predict or correct results. Machine Learning (ML), a key component of AI, is the ability of a computer system to gradually improve the efficiency of solving a certain problem without changes to the software, but by independently evaluating the results of previous attempts.
The use of AI and ML becomes relevant when working with large volumes of data. For them to work correctly, they need access to a large amount of data. In the context of television, this could be, for example, an archive of images or recordings of a large number of videos with attached QoS (video quality) metrics. ML and AI can be used to identify images in videos, generate speech and text translations, create subtitles, find optimal templates for processing content, and perform many other functions.
Let’s take a look at some of the most famous recent projects implemented with the help of AI and ML.
Mobile Application “Who is who”
The application was developed by Sky News specifically for broadcasting the wedding of Prince Harry and Meghan Markle. The purpose of the application was to automatically recognize guests as they appeared at the celebration. Simultaneously with the appearance of a guest, a card with additional information about this person appeared in the application. The application used the AWS Rekognition Services image recognition solution.
AWS (Amazon Web Services) is the largest computer cloud, which has many software add-ons for video processing. AWS Rekognition Services is a tool with a wide range of applications, but to work correctly it must first be configured to solve a specific problem. For the royal wedding, the system was trained to identify the figures and faces of expected guests. During the actual wedding, the application was able to recognize those invitees who were presented to it during training.
Customization of Catalog Navigation with AI in TV
According to Netflix estimates, when choosing a movie to watch, subscribers of the service are most guided by screenshots (Thumbnails) illustrating the content of the film. Netflix creative director Nick Nelson finds that 82% of video selections are determined by screenshots.
Based on these findings, Accedo partnered with AWS and ITV to run an A/B test to determine which images entice subscribers to buy a view. According to Accedo Senior Vice President of Product Frederik Andersen, users choose a movie based on their emotions when viewing screenshots.
AI in TV will result in Content Delivery Optimization
Media delivery chain company SDVI utilized AI and ML to optimize content assessment to meet the varying demands of different regions. This task is particularly relevant for the company’s large clients, such as Discovery, who need to localize their content for different regions of the world. According to Discovery’s head of product Simon Eldridge, it used to take about 2 hours for the broadcaster’s editors to edit an hour-long show to meet regional requirements, but now it takes 10 minutes. He clarifies that editing is still done manually, but the AI highlights the fragments that need to be corrected.
The SDVI platform is based on tools from AWS and Google Cloud Platform that perform object recognition, audio transcription, and filtering of adult content. As a result of their work, detailed metadata is generated, synchronized with the video, allowing them to identify scenes that include violence, smoking, or nudity. As Simon Eldridge points out, tools in large open clouds are often already trained to perform the required tasks. Thus, Google Cloud Platform is constantly working on object recognition on YouTube video hosting.
Currently, all Discovery content is run through object recognition, transcription, and automatic video quality assessment tools. This has made it possible to eliminate delays in the formation of localized versions, which previously regularly slowed down the company’s production processes.
Individual Optimization of Compression Profiles for Different Videos
Netflix has also introduced the practice of individually selecting compression speed profiles for different video titles, taking into account their characteristics. However, without the use of machine learning tools, this work results in hundreds of tests with a dull enumeration of options. This is acceptable for Netflix, since the cinema has significant computer power and a relatively small catalog, but for most projects this option is not feasible.
In this regard, Mux has found a third-party development that allows it to train a computer to optimize the selection of profiles, taking into account previous experience gained when working with similar types of videos. The solution is based on a neuron-like system. It evaluates the low-level attributes of a video to determine its class, and then finds the optimal parameters for that video class that were determined during the training process. The dynamics of the picture, the degree of clarity, and the overall complexity of the plot are subject to evaluation.
According to Mux founder Jon Dahl, the selection process, which previously took hundreds of hours, now requires several seconds of computer processing. During this time, the system evaluates the dynamics of the plot, the degree of clarity, and the overall complexity of the picture. At the same time, John Mahl clarifies, the neural network does not necessarily understand the meaning of its decisions. That is, when making decisions, it does not proceed from general patterns, which it may not know, but simply through many iterations learns to obtain optimal results for the given parameters.
AI in TV will result in Video Quality Enhancement
Ssimwave has offered a solution for TV channel package distributors to automatically determine the optimal video source for delivery in a given environment. As Ssimwave executive director Abdul Rehman notes, content can often be obtained with different parameters. For example, a CNN channel in one source is available with parameters 1080@29.97i, MPEG-2, 40 Mbit/s, and in another – with parameters 720p60, H264, 22 Mbit/s. At the same time, it is not at all obvious that the higher resolution option will provide higher quality video for the subscriber. The final result is influenced by compression and color transfer formats, dynamic range, transcoding procedures, delivery technologies, and versions of subscriber players.
AI and Neural Networks in Television
Television companies are leveraging the latest technologies such as AI and neural networks. AI is used in marketing to process large amounts of data and gather basic information, which is then verified by specialists. AI is also used to find and process the visual underlying elements of on-air graphics. For instance, some of the graphics in the Antimagic project were drawn using neural networks. When interacting with potential scammers, photographs of “people” generated by the neural network are used for ethical reasons. These generated portraits are also useful for casting, allowing directors to set the necessary parameters for the appearance of their characters, simplifying the work of casting directors. An example of AI application is the series “Sidorovs” produced by STS, for which the script was entirely written by a neural network.
AI Solutions for Content Delivery or Viewing
Currently, some online cinemas are actively implementing AI-based content search functions. These algorithms help users create a personalized selection of content according to their tastes and preferences. As decision-making limits are reached, everyone returns to the principle of broadcast television and builds a program grid. Even in services where one of the main competitive advantages was independent selection of content, broadcast networks compiled using AI are now being tested. Essentially, they are a traditional television program grid.
AI-Powered TV Presenters
Indian TV channels have increasingly begun to feature AI presenters who deliver news, horoscopes, weather forecasts, and more. Odisha TV launched India’s first AI-powered regional news anchor named Lisa. Lisa is multilingual, speaking several languages including Odia, English, and others. The company plans to improve Lisa’s skills to facilitate her communication with other people.
Lisa is not the first AI presenter in India. Before her, there was Sana, who appeared in March 2023 on India Today. One of her creators, the chairman of India Today, described Sana as bright, gorgeous, ageless, tireless, multilingual, and completely under his control.
AI Presenters in Other Countries
The use of AI presenters in television is currently in its early stages, but there are already several examples. In 2018, China’s state TV channel, China Xinhua News, launched the world’s first virtual news presenter, created by Sogou based on deep learning technology. An Zi can imitate human speech, facial expressions, and gestures. A year later, a female version of the AI presenter was launched. In 2021, CCTV and Bailey in China created a presenter who communicates in sign language.
Kuwait introduced their virtual news anchor named Fedha, who “speaks” Arabic. Almost simultaneously, AI presenters appeared in Taiwan, Indonesia, and Malaysia. They speak in approximately the same stereotyped way, but their voice still sounds emotionless.
In Russia, on the channel “SvoeTV”, weather forecasts are presented by the Russian virtual presenter Snezhana Tumanova. Internet users have divided opinions about the presenter. Some believe it turned out well, while others argue that this is actually a person, not artificial intelligence. Some say that gestures still give away the work of AI.
How It Works
AI presenters look like realistic 3D models of people and imitate the gest
ures, facial expressions, and intonations of a real person. The work of such presenters is based on the technology of computer graphics and facial animation, as well as neural networks that make it possible to synthesize natural human speech.
IT companies, such as DeepBrain AI, create AI avatars using neural networks for any task. In companies, such presenters can be useful for recording training videos.
That said, AI in TV is surely making televisions much more. Not only on the consumer’s end but also for businesses.