Cyan Canyon
Great Circle DistancesSkipping the Mac TrashFooBrain Help FilesFooBrain v1.0 ReleasedAAAA for iPodA Different Way to GetFiles
A Different Way to GetFiles
March 15, 2009

There comes a time in the life of every man, woman and cylon that they must retrieve a list of the files and/or subdirectories from a folder using .NET. Most of us use the accessible static methods from the Directory class...

string[] files = Directory.GetFiles(@"C:\path");
string[] subdirs = Directory.GetDirectories(@"C:\path");

These are all fine and good for everyone... until you want to filter by file attributes or filter by a regular expression or filter by file size or filter by file date or get files and subdirectories at the same time or iterate through the results instead of just returning the whole lot or doing any of these things quickly. Then you need something else.

The WIN32 API provides a much more versatile way of getting at the contents of a directory and it can be accessed pretty easily using p/invoke. It exists in kernel32.dll on Windows and coredll.dll on Windows Mobile. I'm going to demonstrate the Windows Mobile specifics because that is how I have used them and there are many googleable examples of the Windows desktop version.

There are 3 functions we'll need...

using System.Runtime.InteropServices;

//Windows Mobile: http://msdn.microsoft.com/en-us/library/aa914391.aspx
//Windows: http://msdn.microsoft.com/en-us/library/aa364418(VS.85).aspx
[DllImport("coredll.dll", CharSet = CharSet.Auto)]
private static extern IntPtr FindFirstFile(String lpFileName, ref WIN32_FIND_DATA lpFindFileData);

//Windows Mobile: http://msdn.microsoft.com/en-us/library/aa914380.aspx
//Windows: http://msdn.microsoft.com/en-us/library/aa364428(VS.85).aspx
[DllImport("coredll.dll", CharSet = CharSet.Auto)]
private static extern bool FindNextFile(IntPtr hFindFile, ref WIN32_FIND_DATA lpFindFileData);

//Windows Mobile: http://msdn.microsoft.com/en-us/library/aa911929.aspx
//Windows: http://msdn.microsoft.com/en-us/library/aa364413(VS.85).aspx
[DllImport("coredll.dll", CharSet = CharSet.Auto)]
private static extern bool FindClose(IntPtr hFindFile);

The basics here are that you call FindFirstFile with the path and you get back a handle and data for the first file or directory. You then call FindNextFile with the handle to get each subsequent file or directory until they are all gone or you are done. Then you wrap it up nicely by calling FindClose to clean up.

The FindFirstFile function expects its path parameter to end in '\*' as in 'C:\path\*'. If you leave off the '*', you will get an invalid handle and no data. If you leave off the '\*' altogether as in 'C:\path', you will get data for the 'path' directory and nothing else; no files, no subdirectories.

We'll need a few constants and a couple data structures...

private const int MAX_PATH = 260; //maximum length of a file name
private const int INVALID_HANDLE_VALUE = -1;

//Windows Mobile: http://msdn.microsoft.com/en-us/library/aa915351.aspx
//Windows: http://msdn.microsoft.com/en-us/library/ms724284(VS.85).aspx
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Auto)]
private struct FILETIME {
    public uint dwLowDateTime;
    public uint dwHighDateTime;
}

//Windows Mobile: http://msdn.microsoft.com/en-us/library/aa914427.aspx
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Auto)]
private struct WIN32_FIND_DATA {
    public uint dwFileAttributes;
    public FILETIME ftCreationTime;
    public FILETIME ftLastAccessTime;
    public FILETIME ftLastWriteTime;
    public uint nFileSizeHigh;
    public uint nFileSizeLow;
    public uint dwOID;
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = MAX_PATH)]
    public string cFileName;
}

It is important to note that the layout for WIN32_FIND_DATA is slightly different in Windows. If you are using the wrong definition in your Windows Mobile application, you'll be able to tell because the first two characters of all your file names will be missing.

Here is the Windows version...

private const int MAX_ALTERNATE = 14;

//Windows: http://msdn.microsoft.com/en-us/library/aa365740(VS.85).aspx
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Auto)]
private struct WIN32_FIND_DATA {
    public uint dwFileAttributes;
    public FILETIME ftCreationTime;
    public FILETIME ftLastAccessTime;
    public FILETIME ftLastWriteTime;
    public uint nFileSizeHigh;
    public uint nFileSizeLow;
    public uint dwReserved0;
    public uint dwReserved1;
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = MAX_PATH)]
    public string cFileName;
    [MarshalAs(UnmanagedType.ByValTStr, SizeConst = MAX_ALTERNATE)]
    public string cAlternateFileName;
}

The filename you get out does not include the path. FileSizeLow is the file size you want in most cases. If FileSizeHigh has something in it, then your file is huge (4GB+) and you need to combine it with FileSizeLow to make an 8-byte integer. Everything else should be pretty self-explanatory or easy to interpret from MSDN.

Now then, the function to actually get the files is pretty simple, but varies greatly depending on what you want out of it. We don't want to have to remember to add the '\*' to the path every time we call this function so we'll take care of all that here.

//function signature will vary with the body.
public static void GetFiles(string path) {
    IntPtr handle;
    WIN32_FIND_DATA filedata;

    //we want path to have something useful that we can add to the filename
    //in case we want the entire path, so lets drop the final '*' if it exists
    //and add it later when we need it.
    if (path.EndsWith("*")) path = path.Substring(0, path.Length - 1);
    //make sure the path ends with the backslash though.
    if (!path.EndsWith(@"\")) path += @"\";

    filedata = new WIN32_FIND_DATA();

    //get the first file, get the rest of the files, and close.
    handle = FindFirstFile(path + "*", ref filedata);
    if (handle.ToInt32() != INVALID_HANDLE_VALUE) {
        try {
            do {
                if (filedata.cFileName != "." && filedata.cFileName != "..") {
                    //filtering and storage logic here
                }
            } while (FindNextFile(handle, ref filedata));
        } finally {
            FindClose(handle);
        }
    }
}

All your filtering just fits inside. The most basic filter, checking for a directory would be something like this...

List<string> files = new List<string>();
uint directoryattribute = (uint)FileAttributes.Directory; //16 or Hx10

...

if ((filedata.dwFileAttributes & directoryattribute) != directoryattribute) {
    files.Add(path + filedata.cFileName);
}

Combine this with a filter based on a regular expression passed in to the function...

Regex regex = new Regex(@"\.txt");

...

if ((filedata.dwFileAttributes & directoryattribute) != directoryattribute) {
    if (regex == null || regex.Match(filedata.cFileName).Success) {
        files.Add(path + filedata.cFileName);
    }
}

If you are going for speed on multiple calls, do your best not to create any class instances in your function. Pass in your regex and result container.

Here is the final function I ended up using...

public static void GetFiles(string path, List<string> files, Regex fileregex, List<string> subdirs, Regex subdirregex) {
    IntPtr handle;
    WIN32_FIND_DATA filedata;
    uint dirattribute = (uint)FileAttributes.Directory;

    if (path.EndsWith("*")) path = path.Substring(0, path.Length - 1);
    if (!path.EndsWith(@"\")) path += @"\";

    filedata = new WIN32_FIND_DATA();

    handle = FindFirstFile(path + "*", ref filedata);
    if (handle.ToInt32() != -1) {
        try {
            do {
                if (filedata.cFileName != "." && filedata.cFileName != "..") {
                    if ((filedata.dwFileAttributes & dirattribute) == dirattribute) {
                        if (subdirs != null && (subdirregex == null || subdirregex.Match(filedata.cFileName).Success)) {
                            subdirs.Add(path + filedata.cFileName);
                        }
                    } else {
                        if (files != null && (fileregex == null || fileregex.Match(filedata.cFileName).Success)) {
                            files.Add(path + filedata.cFileName);
                        }
                    }
                }
            } while (FindNextFile(handle, ref filedata));
        } finally {
            FindClose(handle);
        }
    }
}

...called from something like this to get all images in a directory tree...

public static List<string> FindImages(string path) {
    List<string> files, subdirs;
    Regex regex;

    files = new List<string>();
    subdirs = new List<string>();
    regex = new Regex(@"\.(png|jpg|jpeg|gif)$");

    subdirs.Add(path);
    for (int i = 0; i < subdirs.Count; i++) {
        //since subdirectories may be added to the list with each iteration,
        //we end up searching the entire tree level by level
        GetFiles(subdirs[i], files, regex, subdirs, null); //no filter on the subdirectories
    }

    return files;
}

Finally, if you don't want to wait for the entire list to be returned before you start processing, you can use the yield statement to create a file iterator... but I'm not going to tell you how to do that. There is a great article over on codeproject.com that outlines this. They use a SafeHandle that is not available on Windows Mobile, but the basic idea is the same.