Data bloat occurs. The large number of modern IT systems running enterprise applications with data services spanning the planet's cloud networks means that data growth is probably out of control by some standards. . We use data lakes (information resources designed to tank unstructured data flows that cannot initially be processed or used to a broad functional level) and data warehouses (information resources that we have traditionally been unable to use). Although we have “coping mechanisms” in the form of (such as applying some order to storage), we still exist in a data-bloated world.
Many agree that the current data overload situation is further amplified by the proliferation of large-scale language models (LLMs) to meet the needs of generative artificial intelligence. Data volumes have exploded in recent years, thanks to data protection and storage policies, video streams, online gaming, and more. Cloud-based storage services are currently relatively inexpensive alongside on-premises data stores, but data centers require physical space and large amounts of energy.
Is it time for a data diet?
Data protection and recovery vendor Cohesity is one company highlighting today's data overload. The company has compiled industry data that highlights data center energy issues and suggests that efficiency has not kept pace with data growth, but unless organizations start losing data, it will itself have no direct impact. effect. Will I lose some of my data? So…it might be time to lose a few pounds and start a data diet.
The International Bureau of Weights and Measures is the world's oldest international scientific organization. Since its founding in 1875, the organization has had a mission to promote a globally standardized system of units. At its final quadrennial meeting (held every four years, this meeting is not a place for quick solutions or quick decisions), attended by representatives of the 62 Member States, the Commission , said that considering the rapidly increasing amount of data, it is necessary to introduce two new conferences. For the first time since 1991, ronabytes and quetabytes and ronnabytes were used as units of data.
Ronabite has 27 zeros and Quettavite has as many as 30. If you write the latter out, it will look like 1,000,000,000,000,000,000,000,000,000,000.
Storing one Quetabyte on a modern smartphone would require an extremely large number of devices, approximately 93 million miles long if lined end-to-end. According to Cohesity, this is approximately the distance from the Earth to the sun. The company reminds us that the reason for the huge new data entity is the rapid growth in global data volumes. In 2010, people around the world generated just under 2 zettabytes of data, and by 2022 this number has increased to nearly 104 zettabytes.
Environmental impact
Mark Molyneux, Chief Technology Officer, EMEA, Cohesity said: “This (admittedly somewhat cheeky) term refers to companies using the latest data classification and application analytics techniques to protect mission-critical data, subject to appropriate levels of security and compliance, but with no other residual information.” It means they need to be more directly differentiated from the stream, and removed from the ingestion streams that businesses open themselves to. Use data management processes powered by modern artificial intelligence (AI) engines. That way, you can act now before the situation worsens and you have to consider that your data backbone is similar to gastric bypass surgery.”
Molyneux spoke of a “deteriorating situation,” and for now, at least for now, the environmental impact of data proliferation is still limited. According to the International Energy Agency, the amount of data in data centers has more than tripled between 2015 and 2021. However, data center energy consumption remains approximately constant. This is primarily due to significant efficiency gains and a move to more modern hyperscale data centers.
“Data centers have become more efficient, but we are just about at the optimal efficiency level that can be achieved,” warns Cohesity's Molyneux. “Only small efficiency gains remain. Estimates suggest that the planet's current fleet of data centers is expected to generate a total of 496 million tons of carbon dioxide in 2030 with current forms of energy generation. .This will be more than France's total emissions in 2021.''
AI is a big side order
If we follow the company's calorie-counting analogy, we can certainly expect AI to add a ton of additional data to the consumption pile. His 2019 study from the Massachusetts Institute of Technology (MIT) concluded that training a neural network produces the same amount of carbon dioxide as five internal combustion engine cars over its entire life cycle. A 2021 study by Google and the University of Berkeley proposed that training GPT-3, the AI model behind the original version of ChatGPT, would consume 1,287 gigawatt hours, thus emitting 502 tons of carbon dioxide. ing. This is equivalent to the electricity consumption of 120 American households for one year.
“We have very little control over our digital footprint,” Molyneux argues. “Companies often leave behind huge troves of ‘dark’ data, much of which is no longer needed, but they still don’t delete it. This is often due to unclassified data. In many cases, companies don't even know what data is sitting on their servers. The concept of a data diet describes an attitude change that organizations can adopt to reduce the overall amount of data they store. This change will encourage enterprises to take a more proactive approach to how they index, classify, and accumulate data throughout the data management lifecycle. This also means taking proactive steps to consolidate your organization's data store workloads onto a single common platform. ”
While no Atkins Diet methodology is provided here, the Cohesity team does provide some proven practices that they claim can help reduce the data diet that corporate organizations are eating on a daily, weekly, and indeed yearly basis. I am.
Atkins for data?
The aforementioned process of indexing data as accurately as possible through a data management platform can help businesses identify data streams that have become obsolete, redundant, orphaned, or simply stale. . In line with this activity, deduplication tools applied at the data platform level can reduce the data storage load to an astonishing degree. That means a reduction of up to 97 percent, depending on the “type” of data in question, although this number may be debatable.
“There are significant efficiency opportunities here that organizations across all industries should grasp. By cleaning up an organization's data stores, companies can reap benefits across four major 'food' groups. Masu. This approach a) reduces an organization's carbon footprint through the use of cloud resources at a more precise level, and b) reduces the risk of litigation related to outdated personally identifiable information (PII) that resides in an enterprise's data layer. We can say that, and c). ) Ensure that your company's approach to AI is built on the leanest and most accurate information resources across your organization. d) It will likely help your IT team lose weight by making them more agile and less burdened by late-night data builds. About to-go pizza,” Cohesity’s Molyneux concluded.
A data diet may simply be a cute idea designed to get you thinking about streamlining and managing information in new ways. Yes, this is a concept proposed by data protection and data recovery vendors. If you think about it, don't consume extra salt. Sodium intake is already high enough that we need to be more careful with our data.
Add salt-free seasoning.
follow me twitter Or LinkedIn.