Comparing Strings: BEHIND THE SCENES!
Comparing Strings - BEHIND THE SCENES!
Hi! Welcome to this section. We will discuss HOW TO COMPARE STRINGS, a feature that will help you in determining which string comes first in lexicographical order.
You can find more information on Lexicographical order on this link: https://en.wikipedia.org/wiki/Lexicographical_order
Here is what Python’s Documentation says on Comparing sequence objects (So far we’ve studied strings).
You can find the article on this link: https://docs.python.org/3/tutorial/datastructures.html#comparing-sequences-and-other-types
As you can see on this example, and according Python Docs :
- The corresponding items will be checked.
- If they are equal, the next pair of items at the same position will be evaluated, and so on until it finds a pair of items that are not equal (or if the strings are the same, it will consider them equal and a comparison with > or < will return False)
In this case, when it finds
"I" == "A"
this is False, and when they are compared "I">"A"
this returns True, since "I"
comes after "A"
in lexicographical order.
Lexicographical order and the ord()
function
But what is Lexicographical Order? How can the computer determine which letter comes first in the alphabet or which character comes first than another character?
Let’s find out!
Let’s see the ord() function in action!
When we try to compare ‘JKA’>’JKL’:
- The first two items are compared, but what actually happens is that their Unicode code point numbers are compared. In this case, ord(‘J’) returns 74 because Uppercase J in Unicode has a code of 74. When they are compared, they both have the same code and so the equality comparison returns True.
- Then, ‘K’ and ‘K’ are compared and they return True as well
- Finally, ‘A’ and ‘L’ are compared. A’s Unicode code is 65 and L’s Unicode code is 76. When Python checks if they are equal, it returns False. So, as Python Docs says: “if they differ, this determines the outcome of the comparison”.
- In this case 65 > 76 is False (We are using > in the initial string comparison)
The Comparison returns False
You can check the Unicode code list on this link: https://en.wikipedia.org/wiki/List_of_Unicode_characters
IMPORTANT
- The Unicode code for uppercase and lowercase letter are NOT THE SAME!. If you compare an uppercase letter with a lowercase letter, the Uppercase letter will come first in lexicographical order, as you can see in Python’s shell below:
Hope it helps!
If you have any questions, please post them in the forums or right below this post, Community TAs and your classmates will always be there to help you!
Estefania.
Comments
Post a Comment