Data Markets and Data Valuation for Internet of Things

Petar Popovski
3 min readSep 1, 2023

A canonical example of an IoT (Internet of Things) device is a sensor, equipped with wireless connectivity, providing data about a physical phenomenon or event. This data can be used in three principle ways: (1) make an inference or a decision; (2) train a Machine Learning (ML) model; (3) simply store the data for a potential future use. Regardless of the actual use of the data, the IoT device that provides the data is usually not a direct economic beneficiary from the way the data is used. For instance, weather data used in transport, logistics, or energy may be available free of charge to multiple parties, leading to decisions with high economic stakes; however, with no direct economic benefits for the devices that actually provide the IoT data. At the other extreme is a privately held IoT data that brings value only to the owner of that data.

As the connectivity advances towards Internet of Everything, while the intelligence in the devices and network nodes increases, one can speculate that some IoT devices can evolve to act as autonomous economic agents. For instance, a weather station, a surveillance camera, or a satellite-connected IoT sensor may directly interact with other actors in data markets and derive economic value from its data. This brings the question: what is the value and the price of a certain portion of data? For instance, the free IoT data has a zero price, while fully private IoT data can be treated as a data with infinite price. But how to determine the price and value that lies in between these two extremes? For example, pricing of the data in ML tasks can be based on the utility that the data brings in terms of ML model accuracy. In another important work, the value of the data is related to the correlation that this data has with the data from another provider. For instance, if the IoT devices A and B are acoustic sensors (microphones) that are placed relatively close to each other, then their data is correlated. If device A offers its data at a given price on the market, the device B can depress the price, as it can offer correlated data at a lower price that, likely, leads to the same insights.

Motivated by these observations, our group and collaborators have recently published two papers towards enabling future IoT devices to act as autonomous economic agents on the data market.

Devices with correlated data and coalition formation when interacting with the platform (learner). The bold grey lines between devices and the platform indicate interaction interface.

The first work establishes a market for trading IoT data that is used to train machine learning models. The data is supplied to the data market platform through a network, and the price of such data is controlled based on the value it brings to the machine learning model under the adversity of the correlation property of data. The key proposal is to reinforce collaboration opportunities between devices with correlated data to limit information leakage. This approach relies on a strong assumption that the IoT devices can know the correlation of their data before deciding to enter into a coalition. The problem with finding out this correlation is that it cannot be computed by sharing the data with the other devices, as they may take an economic advantage of the data before entering the coalition. This problem is addressed in the second paper, which devises multiparty computation-protocols to compute similarity of two data sets based on correlation, while offering controllable privacy guarantees.

In conclusion, data markets and the evolution to IoT devices towards autonomous economic agents may require change in the assumptions made about the capabilities about those IoT devices. For instance, these devices may be capable for significant privacy-related computations as well as computations related to the process of data valuation. Finally, the type of traffic associated with these IoT devices may be significantly different from the uplink-oriented data provision, as the economic transactions would require two-way interaction of the device with the infrastructure.

--

--