We received a support ticket from a customer:
I'm unable to login to the system. I'm working in department 123. Please add all users working in that department to the system. Thanks.
The system consists of a couple of microservices, one of them responsible to manage identities, roles and permissions. Users were imported from the customer's corporated Active Directory. The customer knew, that we cherry picked the AD users with a special query.
Now, the customer suggested, that the user was in a specific department that was not part of the LDAP query. So we added that department. And resolved the ticket.
One week later
Hello, I'm still not able to login. And there are two more users that cannot login. The rest is fine.
Now, we checked the log files of our identity management service. It was pretty obvious: The password was wrong.
We informed the customer to check the password and closed the ticket.
Hi, I am typing the same password over and over again the whole day and I am absolutely sure, the password is right!
We felt a little insecure at that point. Do we have a bug in the identity management? This service is relatively mature and was used in a lot of systems.
Do you have any strange characters in your password?
Obviously, we could not ask the customer to tell us the password. So we created a user in our test LDAP with a password like
§$%!&/()&/)?=#*. Worked perfectly - in our environment. Was there some strange character encoding problem in the processing chain of browser -> identity management -> LDAP?
Could you please directly login at the IM instead of the applications web site?
Yep. Still not able to login.
More users started to use the system and weren't able to login. We started to get scared.
Could you please change your password to something without any special characters?
Tried that - doesn't work either!
Weeks passed by
We talked to the product development of identity management. They built a special version with enhanced logging. We deployed it.
Please try to login and tell us the user and time.
[10:42:01.132] [ERROR] User <john.doe> tried to login with incorrect password.
Are you REALLY sure that you've used the correct password?
At this time, it was months after the initial support ticket was issued. We spent days of effort to analyse the problem, and the customer wanted to go productive with the system. But no luck.
Until that email, sent from their system administrator:
By the way, I just recognised, that you are using the wrong LDAP url.
But most of their employees were able to login!?
We knew, that parts of the company had been carved out into a separate legal unit.
Everything made sense: Ther is no magical password. We were completely on the wrong track. They simply cloned the VM that ran the Active Directory, one remained with the external company, the other cloned one was used by our customer. Make a guess, to what AD we were connecting? Unfortunately the first one. They split the company, they separated the locations but they did not (yet) separate the network. And they did not delete the employees in the first AD that remained in the carved out part.
So basically all users could login but those who changed their passwords since the carve out were not able to login. And obviously the number of users increased over time because they were forced to change their passwords due to their security policy.
We learned, that looking for a bug, you should not stick to one possible cause (in that case a problem with a specific password) but zoom out a bit to look at the big picture instead.