Finding Confidential Data on Linux
Each of us is personally responsible for locating and removing any confidential data from any computer directory that we can write to.
In some cases, automated tools can be used to help find such data. (See below.)
Unfortunately, the automated tools currently available for linux are limited in the types of files they can scan and in the type of personal information they search for.
In general, files must be verified manually to contain no confidential data.
For more information about Cornell's Data Cleanup and Inventory requirement, please see http://www.it.cornell.edu/datacleanup/
Find_SSNs is a package available from Virginia Tech that searches for U.S. social security and credit card numbers:
Find_SSNs options include:
-p The folder to search (including all sub-folders).
-o The folder to write reports to.
-t The output format: may be html or csv.
-a What to search for: may be replaced by
-s (search for SSNs only) or
-c (search for CCNs only).
Please note that the output directory should already exist:
Scientific Linux 5 and later
Find_SSNs can be run in the non-GUI mode on any LEPP SL5 or SL6 system as in the following example.
python /nfs/opt/find_ssns/Find_SSNs.pyw -p /tmp -o /home/$USER/find_ssns -t html -a
Scientific Linux 4 and earlier
Using an alternative installation of python, Find_SSNs can be run in the non-GUI mode on any LEPP Linux systems oder than SL5. On any LEPP Linux systems older than SL5, Find_SSNs can be run as in the following example:
/nfs/opt/python/bin/python /nfs/opt/find_ssns/Find_SSNs.pyw -p /tmp -o /home/$USER/find_ssns -t html -a
Spider was developed by CIT. It is not currently available from any LEPP Linux system.
The program Identity Finder can be run on any LEPP Windows computer to scan any Linux directories which are exported with a \\samba prefix. The Linux home directories can
be scanned in this way. (See RunningIdentityFinder9