4 Sept 2009

Natural Sort Compare with Linq OrderBy

Having "fun" with IComparer<T> when trying to do OrderBy in Linq with a custom sort?

Me too. I wanted to sort some rows from a DB table by a string 'filename' field. The standard Linq (and SQL) OrderBy(item => item.sortProperty) sorts the strings like so:

image1.jpg
image200.jpg
image30.jpg
image4.jpg

But because I am a human being, I don't want that. No, instead I wanted them sorted like:

image1.jpg
image4.jpg
image30.jpg
image200.jpg

What I want is "Natural Sort". C# doesn't offer a built-in solution, but the Linq OrderBy method does have an override that allows you to use your own IComparer<T> function, a la
myLinqQuery.OrderBy(item => item.sortProperty, new MyComparer<string>())


(Edit: Incidentally, I now have a way of doing this dynamically, if you don't know what sort field will be used until run-time. See here.)

Building your own "Natural Sort" IComparer is not for the faint of heart, or the lazy, so I just nicked some code from Justin Jones and tweaked it a bit:

    public class NaturalSortComparer<T> : IComparer<string>, IDisposable
    {
        private bool isAscending;

        public NaturalSortComparer(bool inAscendingOrder = true)
        {
            this.isAscending = inAscendingOrder;
        }

        #region IComparer<string> Members

        public int Compare(string x, string y)
        {
            throw new NotImplementedException();
        }

        #endregion

        #region IComparer<string> Members

        int IComparer<string>.Compare(string x, string y)
        {
            if (x == y)
                return 0;

            string[] x1, y1;

            if (!table.TryGetValue(x, out x1))
            {
                x1 = Regex.Split(x.Replace(" ", ""), "([0-9]+)");
                table.Add(x, x1);
            }

            if (!table.TryGetValue(y, out y1))
            {
                y1 = Regex.Split(y.Replace(" ", ""), "([0-9]+)");
                table.Add(y, y1);
            }

            int returnVal;

            for (int i = 0; i < x1.Length && i < y1.Length; i++)
            {
                if (x1[i] != y1[i])
                {
                    returnVal = PartCompare(x1[i], y1[i]);
                    return isAscending ? returnVal : -returnVal;
                }
            }

            if (y1.Length > x1.Length)
            {
                returnVal = 1;
            }
            else if (x1.Length > y1.Length)
            { 
                returnVal = -1; 
            }
            else
            {
                returnVal = 0;
            }

            return isAscending ? returnVal : -returnVal;
        }

        private static int PartCompare(string left, string right)
        {
            int x, y;
            if (!int.TryParse(left, out x))
                return left.CompareTo(right);

            if (!int.TryParse(right, out y))
                return left.CompareTo(right);

            return x.CompareTo(y);
        }

        #endregion

        private Dictionary<string, string[]> table = new Dictionary<string, string[]>();

        public void Dispose()
        {
            table.Clear();
            table = null;
        }
    }


The first time I tried to use this, it failed with the rather useless error: "Unsupported overload used for query operator 'OrderBy'."

Turns out it was because I had tried to use my custom OrderBy on the Linq query before it had got the data records from the server, and hence it thought I was trying to run the natural sort in SQL. So I fixed the prob by getting the results first with a quick call to AsEnumerable(), a la:

List<Photo> photos = DataManager.MainContext.Photos
     .Where(item => item.PhotoFilename != null)
     .AsEnumerable()
     .OrderBy(item => item.PhotoFilename, new NaturalSortComparer<string>())
     .ToList();


Works a treat!

27 comments:

  1. Anonymous10:11 am

    I found numerous solutions that suggested to solve my problem, but only this one works! Thanks a lot!

    ReplyDelete
  2. Anonymous6:26 pm

    For those of you attempting to dynamically sort a LINQ IEnumerable query resultset with a string-column name at runtime....behold!

    public class CustomComparer : IComparer
    {
    private string _sortExpression;
    private bool _nullsAreLess;

    public CustomComparer(String sort,
    bool nullsLess)
    {
    _sortExpression = sort;
    _nullsAreLess = nullsLess;
    }

    public int Compare(T x, T y)
    {
    PropertyInfo pi = x.GetType().GetProperty(_sortExpression);
    if (pi == null)
    throw new NotSupportedException(String.Format("Property {0} of Type {1} does not exist and cannot be used in orderby.", _sortExpression, x.GetType().ToString()));

    IComparable a = (IComparable)pi.GetValue(x, null);
    IComparable b = (IComparable)pi.GetValue(y, null);

    if (a == null & b == null)
    {
    return 0;
    }
    else if (a == null & b != null)
    {
    return (_nullsAreLess) ? -1 : 1;
    }
    else if (a != null & b == null)
    {
    return (_nullsAreLess) ? 1 : -1;
    }
    else if (a != null & b != null)
    {
    return a.CompareTo(b);
    }
    return a.CompareTo(b);
    }
    }

    And then you can just do this:
    String _SortPropertyName = "role";

    var query = from a in mdc.aspnet_Accesses select a;

    var sortedResult = query .AsEnumerable().OrderBy(a => a, new CustomComparer(_SortPropertyName, true));

    This is much cleaner than trying to declare a lambda expression, resulting in object boxed/expression.convert'd tree nodes.

    It handles ValueTypes and Nullables, and you don't get those pesky 'cannot orderby System.Object' errors.

    ReplyDelete
  3. Anonymous8:55 pm

    Thanks for the post..Saved my day!

    ReplyDelete
  4. Anonymous11:30 am

    Saved me a day of looking. Great example!!!

    ReplyDelete
  5. Anonymous10:28 am

    it puts 8.1 before 8, it should be 8 first

    ReplyDelete
  6. Anonymous9:44 am

    Great code!

    ReplyDelete
  7. Anonymous6:11 pm

    gr8 code

    ReplyDelete
  8. Anonymous12:12 pm

    Thanks a lot! Using it now! :)

    ReplyDelete
  9. Thanks for sharing bro!

    ReplyDelete
  10. Thanks for sharing man, it was pretty straight and effective.

    ReplyDelete
  11. I found a bug in the code, which is stopping this comparer to have the effect it should have, i.e it is not sorting the collection, just working similar to as it was working without the comparer.

    The bug is here:

    for (int i = 0; i < x1.Length && i < y1.Length; i++)
    {
    if (x1[i] != y1[i])
    returnVal = PartCompare(x1[i], y1[i]);
    return isAscending ? returnVal : -returnVal;
    }

    There are no braces after if statement, which is making only the 1st statement after if to be dependent over if, the return statement will return no matter whether the condition in if statement is true or false. Because of which this loop will only run once. I was having only numbers in my string when this code broke. So I debugged it to find that placing the braces after if will solve the problem. You can try it. Also initialize the returnVal=0, else you will see an error.

    Final Class:

    public class NaturalSortComparer : IComparer, IDisposable
    {
    private bool isAscending;
    private Dictionary table = new Dictionary();


    public NaturalSortComparer(bool inAscendingOrder = true)
    {
    this.isAscending = inAscendingOrder;
    }
    #region IComparer Members


    public int Compare(string x, string y)
    {
    throw new NotImplementedException();
    }

    #endregion

    #region IComparer Members

    int IComparer.Compare(string x, string y)
    {
    if (x == y)
    return 0;

    string[] x1, y1;

    if (!table.TryGetValue(x, out x1))
    {
    x1 = Regex.Split(x.Replace(" ", ""), "([0-9]+)");
    table.Add(x, x1);
    }

    if (!table.TryGetValue(y, out y1))
    {
    y1 = Regex.Split(y.Replace(" ", ""), "([0-9]+)");
    table.Add(y, y1);
    }

    int returnVal=0;

    for (int i = 0; i < x1.Length && i < y1.Length; i++)
    {
    if (x1[i] != y1[i])
    {
    returnVal = PartCompare(x1[i], y1[i]);
    return isAscending ? returnVal : -returnVal;
    }
    }

    if (y1.Length > x1.Length)
    {
    returnVal = 1;
    }
    else if (x1.Length > y1.Length)
    {
    returnVal = -1;
    }
    else
    {
    returnVal = 0;
    }

    return isAscending ? returnVal : -returnVal;
    }

    private static int PartCompare(string left, string right)
    {
    int x, y;
    if (!int.TryParse(left, out x))
    return left.CompareTo(right);

    if (!int.TryParse(right, out y))
    return left.CompareTo(right);

    return x.CompareTo(y);
    }

    #endregion

    public void Dispose()
    {
    table.Clear();
    table = null;
    }
    }

    ReplyDelete
    Replies
    1. Doh, you're right - I made a little mod to the code last week and overlooked this. Sorry!

      Delete
  12. Bookmarked. Thank you! Great code.

    ReplyDelete
  13. Anonymous8:22 pm

    This code is fantastic!!! Thank you for sharing!! I did however find an issue where it is not able to sort text in the form of

    D-016.00
    D-016.0
    D-016.000

    I assume this would be because when it does an int parse it would truncated all leading zeros. I have tried many combination to solve this but all of them have so failed. Does anyone have any idea how to solve this? Thanks :)

    ReplyDelete
  14. Anonymous1:00 pm

    Great Code, works a treat. Something like this helped me for the negative number scenario. All my items were separated by a whitespace, so tweaked the code with in
    int IComparer.Compare(string x, string y) as follows:


    if (!table.TryGetValue(x, out x1))
    {
    x1 = Regex.Split(x, " ");
    table.Add(x, x1);
    }

    if (!table.TryGetValue(y, out y1))
    {
    y1 = Regex.Split(y, " ");
    table.Add(y, y1);
    }

    ReplyDelete
  15. Anonymous2:34 pm

    Great, it works good!
    But when i have NULL Values in the Column, then the Sort not working.

    ReplyDelete
  16. Thanks!! Its work fine and its exactly that i was looking for!!

    ReplyDelete
  17. Anonymous10:12 am

    Tnx :)

    ReplyDelete
  18. Whats the <T>, it isn't used in the code

    ReplyDelete
    Replies
    1. See http://msdn.microsoft.com/en-us/library/aa479858.aspx

      Delete
    2. I know what generics are, however i cannot see any reference to the type T in the class. So why is it needed?

      Delete
  19. Anonymous12:07 am

    Thanks mate! Works as a charm!

    ReplyDelete
  20. Thanks.
    However as T is not used, you could also start the class with:
    public class NaturalSortComparer : IComparer, IDisposable

    Also, when there's a name without a number it puts that name at the bottom instead of the beginning. I wanted to see:
    image
    image1
    image 2

    instead of
    image1
    image2
    image

    so I reverted the return values in this part:

    if (y1.Length > x1.Length)
    {
    returnVal = -1;
    }
    else if (x1.Length > y1.Length)
    {
    returnVal = 1;
    }

    Thanks again though :)

    ReplyDelete
  21. Thanks for your post. It helped me.

    ReplyDelete
  22. Anonymous7:23 am

    Thanks for the code. I'm using it in a WinCE Project and it works fine. Exactly what i was looking for!

    ReplyDelete
  23. First of all I have to admit I have not profiled it (which I should). However my guess is you can gain some performance by creating/compile your Regex-expression once, and then use it multiple times.

    Hence first you create/compile the regex:
    private static Regex rxNumbers = new Regex("([0-9]+)", RegexOptions.Compiled);

    And then change:
    x1 = Regex.Split(x.Replace(" ", ""), "([0-9]+)");
    y1 = Regex.Split(y.Replace(" ", ""), "([0-9]+)");

    Into:
    x1 = rxNumbers.Split(x.Replace(" ", ""));
    y1 = rxNumbers.Split(y.Replace(" ", ""));

    Likewise, in stead of the two "Replace(" ", "")" you could or "RegexOptions.Compiled" with "RegexOptions.IgnorePatternWhitespace" to have the RegEx-expression remove whitespaces (again I have not profiled which is faster).

    ReplyDelete

Comments are very welcome but are moderated to prevent spam.

If I helped you out today, you can buy me a beer below. Cheers!