
What do hackers want? If you now type "money", you are on the safe side, because this is always true. According to Verizon’s Data Breach Investigations Report (DBIR), financial gain remains a hacker’s stated goal in more than 75% of security incidents investigated.
The better answer to the above question, however, is that hackers are targeting data from corporate file systems – sensitive data and, of course, data that can be monetized. Usually unencrypted user-generated files (internal documents, presentations or Excel spreadsheets) that are part of the work environment. Sometimes it is also files exported from structured databases. Where customer accounts, financial data, revenue forecasts and quite a few other corporate internals are stored.
The demand for such data has grown enormously, as have the resources to store vast amounts of data. Nearly 90% of global data volume generated in the past two years alone. By 2020, data volumes will increase by 4.300% increase!
Data security challenges
Unfortunately, the relatively simple tools (often supplied with the operating system) that IT administrators use to manage corporate content are not necessarily designed to identify and protect data. Most of the time, companies rely on the support of third-party vendors. However, there is rarely a general consensus on how best to ensure such data protection. Of course, virtual gateways can be protected by firewalls and intrusion prevention systems.
Or you can be reasonably realistic and assume that hackers will somehow manage to get into the system in the end.
True security comes from within
However, successful attacks on well-protected companies in recent years have taught us one thing: there is no such thing as a completely intrusion-proof system for data. Why?
Hackers are literally invited and still walk straight through the front door into a network. Their favorite scam: phishing emails. Here, the attacker pretends to work at a well-known company (such as FedEx, UPS or Microsoft) and sends an email to an employee. The file attachment ostensibly contains an invoice or, optionally, something that looks like a business document. If an employee clicks on it, the malware contained in the attachment is executed.
Verizon has been observing hackers’ attack methods in practice for years as part of DBIR. Social engineering techniques, which include phishing and other forms of identity fraud, have become immensely widespread, according to the report (see chart).


According to Verizon DBIR, social engineering continues to be on the rise.
You’re in … now what??
The DBIR team explicitly points out that hackers get into a system fairly quickly (within a few days or faster). However, it often takes months for the IT security department to even notice the attack and assess the consequences.
Once hackers have a foot in the door, they use disguised FUD malware to operate covertly. "FUD" stands for "fully undetectable"; this means malware that is not recognizable as such. Unnoticed by virus scanners, it captures employee keystrokes, analyzes file contents, removes or exports data.
Those who don’t use phishing often look for vulnerabilities in publicly available web interfaces and gain access using SQL injection or other methods.
And hackers still benefit from poorly chosen passwords: attackers often determine the passwords used and simply impersonate employees when logging into the system.
Conclusion: Attackers come and go without causing any alarm.
In order to effectively defend against this new generation of hackers, it is advisable to first identify all sensitive data and then develop defense mechanisms on this basis – an approach also known as "inside-out security.
Inside Out Security in three steps
Step 1: Inventory of IT infrastructure and data
If you want to develop such a strategy, you should ideally start by taking stock of your IT infrastructure.
Virtually all of the above standards provide control mechanisms for asset management, or asset classification. The goal is to clarify what content is stored in a system in the first place (and where). It is difficult to protect resources that are not known to exist… In addition to common hardware (routers, servers, laptops, file servers, etc.), it is also difficult to protect personal data that is no longer needed.), digital system components are classified: critical software, applications, operating systems and, of course, data/information.
For example, NIST’s CIS framework is intended to serve as a voluntary guideline for protecting the IT of power plants, transportation companies and other key service providers. Like all frameworks, the CIS framework provides a general structure through which various specific standards are mapped.
The first part of the framework includes an identification component, subcategories of which include resource inventory (see ID.AM-1 to ID.AM-6) and the classification of content (see ID.RA-1 and ID.RA-2) include.


Identification and classification of IT assets according to NIST’s CIS framework
PCI DSS 3.2 provides for control mechanisms to identify storage hardware and, very specifically, sensitive cardholder data and information (see requirement 2.4).


PCI DSS: Inventory of IT assets
Step 2: Reduce risk through access control
In addition to locating sensitive content, other control mechanisms provided in the security standards and best practices include. This includes checking the access rights – a step which is mostly located within the category "risk analysis". This makes it possible to determine whether and to what extent unauthorized users can access sensitive data. The next step is to change permissions so that exactly this cannot happen.
The CIS framework additionally specifies a protection function with subcategories for access control (see PR.AC-1 to PR.AC-4).
In particular, to implement the principle of minimal rights assignment (PR.AC-4): Employees are given only the permissions they need to do their job. In such cases, it is also referred to as "Role Based Access Control" (RBAC) or "role-based access control".
Similar ways of controlling access to data are also included in other standards.
The most important thing is that the security measures become an integral part of the workflows. Measures such as continuous data analysis, a risk assessment and adjusting access rights. This creates a continuous risk assessment and risk evaluation process.
Step 3: Data minimization
In addition to identifying sensitive data and restricting access authorizations, the standards also provide for a further mechanism. Namely, deleting or archiving personal data that is no longer needed on.
This restriction on the retention of data is included in PCI DSS 3.1 under requirement 3. It specifies keeping cardholder data storage to a minimum. Retention period is also required under the new EU General Data Protection Regulation (GDPR). It is mentioned in the paragraph on data protection by design and by default (Article 25).
Minimizing data collection and retention as a way to reduce risk is also part of another best practice: privacy by design.
The hard part: data classification on a grand scale
The first step – identifying and classifying sensitive data – is easier said than done.
Until now, unstructured data is usually classified by matching relevant parts of the file system with known patterns, which is as simple as it is time-consuming. This is done with the help of regular expressions and other criteria. A record is kept of which files correspond to the patterns.
Of course, this process has to be repeated constantly, because new files are created and existing ones are updated all the time. However, such a comparison starts again from the first file in the list and continues until the last file on the server is reached.
That is, information from previous system scans is not used. So if there are ten million unchanged files stored in a file system and only one new file has been added since the last check, then the next check will check all ten million files and that one new file.
It makes more sense to check the respective modification time of the files and then only search the contents of the files that have been changed since the last check updates were. This strategy is used for incremental backups. This checks the modification times and other metadata associated with the files. Because of the CPU and disk accesses, however, this is a resource-intensive procedure. For file systems in the two- or three-digit terabyte range, which are not uncommon in companies, the method proves to be not very suitable. You still have to check the metadata of every single file from the last time it was accessed.
The most useful is a true incremental scanning. There is no need to check the modification date of each file to determine if it has been modified.
Instead, this optimized method is based on a list modified File objects provided by the operating system. Or, to put it another way, if it’s possible to track every file change – and the operating system kernel has access to this information – then it’s possible to create a list only with the file objects to be checked. This approach is much better than the usual (lengthy) check of the whole file system.
To do this, one must rely on central Metadata in the file system can access. At a minimum, one needs the timestamps of the file changes; ideally, other data related to the user and group access identifiers is also available.
Varonis solution and conclusions
Is there a way to truly scan the data incrementally using a data-centric risk analysis program?
This is possible for example with systems like the Varonis IDU Classification Framework. Here’s how it works:
The Classification Framework is based on a core Varonis technology called the Varonis Metadata Framework. This allows access to the internal metadata of the operating system, and all the Events track related to files and directories like create, update, copy, move and delete. The Varonis Metadata Framework is not just another application that runs under the operating system. Rather, it can be integrated into the operating system in a hardware-like manner and therefore does not cause any noticeable computational effort.
Based on the metadata, the Classification Framework can then perform a quick incremental scan. The framework searches only a small part of the file objects, namely the changed or newly created ones. These files can be checked directly without having to search the whole system.
Then, the Varonis IDU Classification Framework quickly classifies the file contents, using Varonis’ classification engine or third-party classification metadata such as RSA.
IDU Classification Framework is designed for use with DatAdvantage, a product that uses file metadata to determine who is actually responsible for sensitive data.
Ultimately, these data owners – those who are responsible for the content and know best who should have access to it – can then assign appropriate access permissions and thereby improve the security of the data.



Andy Green
Andy blogs about data privacy and security regulations. He also loves writing about malware threats and what it means for IT security.