Discussion about legacy in IT often revolves around the challenges associated with legacy systems, including security and compliance risks and maintenance costs.
But legacy is not only about IT systems — legacy data also poses serious challenges that organisations need to understand and address.
What is legacy data and why is it a problem?
Legacy data is content that is out of date or is no longer accessed and used. This data presents business, cybersecurity, and compliance challenges.
First, legacy data can be a serious business risk. If there are multiple copies of data or files that are not well organised, users can struggle to find what they need, hurting their productivity. Moreover, if data isn’t maintained, it can become incomplete or inaccurate, which can result in suboptimal or even incorrect business decisions.
Legacy data is also a cybersecurity risk. Often, it is not encrypted or otherwise protected from improper access, making it vulnerable to both malicious attacks and accidental misuse, such as an employee sending confidential information to the wrong recipients. Data volume and security go hand-in-hand — the more you store, the more you increase your attack surface area.
There are also compliance challenges. On the one hand, regulations often require that some data be kept for a certain amount of time, especially in sectors such as healthcare and finance. But the opposite is also true: regulations like the General Data Protection Regulation (GDPR) require companies to erase personal data when there is no longer a legal basis to hold it. Therefore, it can be difficult to decide whether a given dataset should be discarded. A recent incident in which a retailer was fined $300,000 for storing nearly 20 years' worth of payment card data on its e-commerce server is a great illustration of how legacy data can compound the consequences of a breach.
How to identify legacy data
To identify legacy data, start by inventorying the data you have and classifying it according to its sensitivity and value. Be sure to document when and why it was generated or collected, how often it is used, and when it was last updated. Note that this is not a one-time event — since data is constantly being created and collected, you should make data discovery and classification a regular process.
For many organisations, this process is difficult to perform manually because of the sheer volume of data in play. Automated data discovery and classification software can provide quick visibility into what data you have and where it is located. Plus, it delivers far more accurate and reliable results than manual methods.
How to protect legacy data
After you’ve classified your data, you should organise, secure and store it in the most efficient and cost-effective way. First, create a logical and manageable folder structure. That way, employees won’t waste time searching for or re-creating content.
Next, secure the data. Identify and address any incorrectly stored sensitive files and educate employees on security rules to prevent the problem from happening again. Ensure that only authorised individuals can access sensitive information and conduct regular authorisation reviews to prevent privilege sprawl.
Remember that some information must be retained for a specific period for regulatory, financial or legal reasons. Locate that data and archive it securely and establish a schedule to delete it when required.
Benefits of data classification
Classifying your data helps you strengthen security and improve compliance. Plus, by empowering you to retain only what you need, it reduces storage costs. Even though disk space gets cheaper by the day, constantly buying more data storage is nevertheless a drain on your budget. And if you store data in the cloud, you can easily see the cost efficiencies of getting rid of unneeded content.
In addition, data classification helps your workforce be efficient because it reduces how much data they must wade through to find what they need and helps ensure that they are using current information for decision-making.
Overall, legacy data poses a range of security, compliance, and productivity risks. You can mitigate those risks by identifying and classifying your legacy data.