Checksum or hash sum on files
Add an option to collect a checksum on files (using MD5 or some other algorithm) if Directory Monitor detects a file change. This would allow for easier and more accurate tracking of files moving (from one folder to another), being edited, and renamed.
-
We can consider exposing the MD5 since it's readily available but not being used/calculated. This will have to come with a warning that logging the MD5 will require considerable resources on very busy machines and when having to calculate on large file contents (4GB+).
The hash would be exposed through the grid and macros system so that it can be extracted in various plugins such as the text log, emailer and application executor. A new column will be added to the database schema to store the value (NULL by default) and an option to enable this per directory.
-
W. Patrick Gale commented
Thank you for the feedback. I understand your performance concerns. In my case I would only be using this on a network share and select drives with low traffic. I am trying to manage files with a more automated approach from version control systems like SVN. I would be using file hash for my own matching purposes against against probably an ancillary db table to associate meta data with the files. I do not want a complicated file management system like most out there and I feel DirectoryMonitor is a good starting point that I can probably build off of with custom plugins. Thanks for offering this software at a great price. - cheers
-
Full file contents MD5 hash calculations where being done in the past to filter duplicate or synthetic events but performance degraded tenfold as well as creating unnecessary file locking and increased memory usage to read the contents (especially from large files).
Plugins can request the MD5 from each event if necessary but it not performed automatically and does not make it easier, not more accurate for tracking file changes. Unless you want the MD5 for the files exposed in the log/grid for your own interrogation and matching purposes?
To detect a "move" this would be necessary since a "delete" event would be too late to checksum the file before it gets to the destination. The overhead of hashing everything like this in case something moves is not worth the significant performance cost it incurs on the entire application.
Modifications and renames are already incredibly fast and accurate without the need for checksums. If you are having reliability problems of any kind then please let us know, this could more than likely be solved without having to perform costly checksums.