Skip to Content

Organic: A Case for Democratizing Data in AI

Steven Krone
Jun 3, 2024

What does it mean to democratize data? Why is it important for businesses to understand this trend? To approach this topic, we must understand the current trends in technology and AI development. In 2024, as has been the case for nearly a decade, there is massive buzz regarding Artificial Intelligence and Machine Learning.

To provide a brief definition from Google:

artificial intelligence – “the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.”

The crucial distinction is that these technologies are designed to emulate or replace human capabilities. While programming this using traditional programming languages can be incredibly difficult, more sophisticated techniques have been developed such as Neural Networks.

A Neural Network is a machine learning method of processing data inspired by how neurons work in our brain. These were first developed in 1943 by Walter Pitts but did not find immediate usability. This was because these computing methods required a much greater computing power than was available at the time. However, consistent hardware improvements, particularly with Graphical Processing Units (GPUs), have made it possible to process more data than ever before. But what is the “data” being processed? This leads us to an industry trend known as “Big Data.”

For our purposes, Big Data refers to the vast volume of structured and unstructured data generated by various sources, including digital platforms, sensors, social media, and more. With the growth of the Internet, the variety, volume, and velocity of types of data have grown exponentially [1]. From here, we have all the ingredients for scaling these Neural Network techniques. These factors have contributed to the take-off of Artificial Intelligence in the last decade. While the techniques vary, we can describe them under the umbrella term “Deep Learning.”

With this surge in development, there is curiosity, fear, interest, ambition, and many questions. People wonder what these new emerging technologies mean for us.  Since the functionality of AI is designed to mimic human capabilities, concerns arise about AI tools replacing human workers.

________________________________________________________

These questions drove me to engineering in the first place. If machines can start replacing tasks and actions in the human domain, what types of activities would people focus on. I personally don’t believe there is an issue with technology helping simplify or streamline our work, tasks, or chores. It is not inherently wrong, for example, to use a shovel to dig a hole. One is simply using tools to do something that would already be done by hand. The only obvious loss, to me, is having softer hands.

This is an oversimplification, as AI tools are much more complex. However, in an abstract sense, they are still tools. It is important not to lose sight of this and speak more to reality rather than the science-fiction potential. I could also discuss the misuse and abuse of AI tools, but the point is that I could do the same regarding a shovel. My recommendation for the shovel would be for making gardens, rather than anything else, so I’ll leave it there.

To me, it’s much more important to discuss how these tools are developed and invented. Continuing with the shovel analogy, it is easy to understand who gets credit. If I sourced the materials, crafted the shovel, and sold it to a store at a fair price, I could continue to support my shovel-making business. However, as we described earlier, AI is developed using an accumulation of data. By participating in a company’s system, we are also contributing to their data advantage. Once the technology has been developed, it is hard to put the genie back in the bottle.

________________________________________________________

The most profitable and innovative technology companies of the 21st century have developed the tools and platforms of the modern Internet era: a blend of hardware and software used by nearly every modern citizen. By becoming pioneers in this space, they have positioned themselves to build the infrastructure for nearly every person’s digital presence in the modern world.

By leveraging the data they collect, they can improve their products and services and continue innovating. This is a massive competitive advantage that they will want to maintain. The next step in this innovation curve would be developing General Purpose AI tools based on this data for customers to leverage. While I believe these companies are wonderfully innovative and creative, there is little credit or reimbursement given to those whose data is included in these AI Models.

This differs from having the rights to your own data.  Regulations such as the European Union’s General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA), have been enacted to support consumers’ rights to their data. These steps of progress set a precedent for data ownership and give users leverage to petition corporations. With these regulations, we have a legal framework to have our data removed or shared by these tech corporations.

Getting access to our personal data is one issue. The focus of this article goes beyond that. It is important to understand that if our data is used to train an AI model, it becomes vastly more difficult to control or reverse how an AI model has been trained by that data.

A more focused example of this is DALL-E, a notable AI model developed by OpenAI that generates images from textual descriptions. The name “DALL-E” is a combination of “Dali,” the surrealist artist, and “Wall-E,” the Pixar character [2]. Using the same Deep Learning techniques we discussed earlier, this model can create a wide range of images, based solely on the text prompts provided by the user.

There are already examples of artists having their unique style replicated. One such artist, Greg Rutkowski, has taken issue with this [3]. He has pointed out that users can prompt the AI with his name to generate new art in his style. Despite the obvious implications of plagiarism, there is no clear legal precedent. If an exact image of the artist is not being sold commercially and is obfuscated by the technical complexity of the AI, it is challenging to prove any sort of copyright violation.

This is where I see the most potential for disenfranchisement: people having their data, and therefore their works, usurped by machines that can cheaply create a similar item. But by acknowledging how the tools are developed, we can understand that people and their data play a crucial role in the very development of AI.

This emphasizes the importance of reflecting on how integral humans are to the AI development process. Not just as consumers of the AI product, but as the very source material AI hopes to imitate. With this knowledge, it is easier to dispel fears about how AIs will “take our jobs.” If anything, the opposite is true. We need people and their participation, more than ever. The harmonious feedback loop of people, data, and algorithms should be balanced. However, if corporate platforms and AI tools don’t pay it forward to people in this equation, it can easily be unsustainable.

Given the current setup of companies housing user data, providing users with an understanding of the value and utilization of their information proves challenging. Ideally, we could build this into the very systems themselves, all while ensuring customer security and privacy. Distributed Ledger Technologies (DLT), such as the blockchain that supports Bitcoin, is an example of a technical solution that can meet these needs.

Corporations do not have an obvious incentive to move their platform to a distributed ledger. Blockchain experienced a massive hype cycle starting in 2017, but that wave has not lasted. I believe this is because the desire for traceability and accountability related to data will not come from corporations; it needs to originate from consumers choosing to engage with platforms where their data is respected and prioritized.

________________________________________________________

To me, this sentiment echoes a topic that may resonate more personally: Organic foods.

Consumers can advocate for a certain level of food quality through their purchasing choices. While businesses might not naturally lean towards producing organic food due to the added costs, consumer demand creates a market sector for these products. Consumers can influence the production trends by supporting goods and services they believe in.

In my view, a similar thing needs to happen with AI. Trusting an AI’s results becomes challenging when we are unaware of the underlying data. When ChatGPT provides an answer, I can’t pinpoint an exact reference or source. Even leading AI experts cannot expose the exact heuristics behind a decision [4]. Understanding how an AI makes its decisions primarily stems from comprehending the data we input.

Taking a long-term perspective, prioritizing and advocating for customer data ownership can foster loyalty and trust in the platforms they engage with. For AI tools to remain relevant and valuable, we require users and customers willing to utilize these tools and share their data. Without trust, the feedback loop of user-provided data can break.

The exact nature of these solutions, partnerships, or systems remains uncertain. Now more than ever, we must be innovative in defining and distributing the value these models generate to encourage participation from all stakeholders. Granting users greater autonomy and ownership over their data, I believe, represents a step in the right direction.

­­­

References:

  1. Oracle
  2. DALL-E
  3. Artists say AI image generators are copying their style to make thousands of new images — and it’s completely out of their control
  4. Even the scientists who build AI can’t tell you how it works

About the author

Senior Consultant | USA
Steven is a passionate technology advocate; eager to discover ways to transform businesses and their processes. Guided by a user-centered approach, Steven approaches problems with a desire to understand the essential problems facing customers, businesses, or users of a system.

Leave a Reply

Your email address will not be published. Required fields are marked *

Slide to submit