Your smart TV may be crawling the web for AI

it is low pass By Junko RoettgersA newsletter on the ever-evolving intersection of technology and entertainment, syndicated exclusively for The Verge Customers once a week.

These days, if you sign up for a new streaming service, you generally have two options: either pay a hefty premium for an ad-free experience, or endure all the covert tracking that comes with frequent commercial breaks and ad targeting.

Web data aggregator Bright Data is introducing streaming service operators to an alternative approach for apps running on Samsung’s Tizen and LG’s webOS platforms — one that comes without ads and exorbitant fees. To unlock the new revenue stream, all publishers have to do is integrate the company’s Bright SDK into their TV apps and convince viewers to join Bright’s monetization network.

“We don’t do any kind of tracking,” Bright Data chief product officer Ariel Shulman explained during a webinar for streaming industry insiders two years ago. “We work quietly and completely anonymously in the background. Users don’t actually see or feel anything.”

Hunt? With Bright’s SDK, a viewer’s smart TV becomes part of a vast global proxy network that crawls and scrapes the web. The company claims to operate 150 million such residential proxies worldwide, including apps running on desktop PCs and mobile devices. Together, these devices collect petabytes of public web data from a wide range of different locations and IP addresses. This approach allows the company to capture localized versions of websites, but also helps prevent web crawler blacklists. The data collected is then sold to companies to train AI models, among other things.

Here’s how Bright’s smart TV partnership works: When a consumer downloads and installs a participating app, they’ll see an opt-in screen asking them to confirm their desire to participate in Bright’s proxy network. For example, for an app called Petflix that was available on the Roku App Store until recently, the note reads:

“In order to enjoy Petflix for free with fewer ads, you are allowing Bright Data to occasionally use your device’s free resources and IP address to download public web data from the Internet. Bright Data will only use your IP address for approved business-related use cases. No personal information about you is accessed or collected except your IP address. Period.”

“Our network is based on individual participation through consensus,” explains Bright Data spokesperson Jennifer Burns. “All users can opt-out at any time through a fast two-click process.”

Once a consumer joins Bright Data’s network, their smart TV begins downloading publicly available webpages as well as audio and video data, which is then sent to Bright’s cloud servers. The company claims to do this only if it does not impact the device’s bandwidth or processing capabilities, with Shulman saying that individual devices download only 50 MB of data per day. In fact, the user has no way of knowing whether the SDK downloads web data at any time.

In some cases, your smart TV may even crawl the web for brightness as soon as you turn it on. “On some operating systems, […] Our SDK is given permission by the user to run in the background,” Shulman explained during his webinar. “This means our monetization continues even if the app itself is not running.” All consumers have to do is run the app once and opt in to Bright’s network, and the device will continue to crawl the web every day until they opt out again or uninstall the app.

Bright Data is not the only company to operate such residential proxy networks. Some of its competitors have faced criticism for unsavory business practices. Last month, Google took action against the IPIDEA network, which Google’s Threat Intelligence Group called “the world’s largest proxy network.” IPIDEA worked with several SDK providers to distribute its code to third-party apps, including smart TVs.

Once devices were enrolled in its network, IPIDEA’s operators allegedly rented those resources to hacking groups in China, North Korea, Iran, and Russia. “We […] “IPIDEA is being leveraged by a wide range of threat actors ranging from espionage, crime, and information operations,” Google’s Threat Intelligence Group wrote in a January blog post.

To be clear: Google’s security researchers have not drawn any connection between IPIDEA and Bright data, and Bright goes to great lengths to differentiate itself from bad actors. “Our SDK, along with all of our technology, is reviewed by AppSteam, Google, McAfee and others, and most recently, regularly audited by PwC,” says Burns. “Bright SDK applies rigorous partner selection criteria and vets every application through strict compliance processes.”

Yet the company has been hit by widespread backlash against residential proxy activities. Google has adopted policies against proxy SDKs running in the background, and is now telling developers that they are allowed to use proxy services only “in apps where it is the primary, user-facing core purpose of the app.” Amazon has added a provision to its developer policies that “completely bans apps that provide proxy services to third parties.” Roku also blocks developers from using the Bright SDK and similar proxy services.

All of those changes have made it more difficult to figure out how widespread SDK use on smart TVs really is. A few dozen Fire TV apps still mention the SDK on Amazon’s App Store, but no longer use it. I was able to download some apps from Roku’s store that were still using the SDK, including the aforementioned Petflix app. However, after we contacted Roku for this story they have disappeared from the Apps Store.

The new restrictions against proxy SDKs have a direct impact on Bright’s addressable market in the smart TV sector. The company used to offer its solution to Roku, Android TV, and Fire TV app developers, but Burns told me it no longer supports these platforms. Bright still lists Samsung’s Tizen OS and LG’s webOS as supported smart TV platforms, and has published more than 200 first-party apps on LG’s App Store alone. LG spokesperson Li Li told me that the Bright SDKs “are not officially supported by LG, and their operation on the webOS platform is not guaranteed.” Samsung did not respond to multiple requests for comment.

There are arguably many legitimate use cases for web crawling. “Our network specifically serves legitimate purposes, supporting journalists, nonprofits, academic researchers, cybersecurity companies, and other leading businesses around the world,” says Burns.

The problem is that consumers don’t know whether that legitimate purpose is something that aligns with their own personal values. Case in point: Bright Data supports a number of nonprofits, including some that use its proxy network to track hate speech on social media. However, the company also works with the AMCHA Initiative. The group maintains an “Anti-Zionist Faculty Barometer” and includes statements by students and faculty against Israel’s war in Gaza, as well as calls to expel the schools from the country, in its anti-Zionist incident tracker.

With AI companies facing scrutiny over their environmental impact, treatment of intellectual property and ability to replace human labor, some consumers may also feel uneasy about having data collected by their TVs to train AI models.

Now, some consumers may decide that such concerns are exaggerated, and may willingly choose Bright’s network if it means they get to see fewer ads or pay less for their streaming services. I, for one, would have liked to see an extra ad break or two.

Follow topics and authors To see more like this in your personalized homepage feed and get email updates from this story.




<a href

Leave a Comment