Authentication

AUthenticaion & key exchange protocols, user authentication, biometrics, and CAPTCHA

Paul Krzyzanowski

October 23, 2020

Authentication is the process of binding an identity to a user (or service). There is a distinction between authentication and identification. Identification is simply the process of asking you to identify yourself (for example, ask for a login name). Authentication is the process of proving that the identification is correct. Authorization is the process of determining whether the user is permitted to do something.

Combined Authentication & Key Exchange Key Exchange

The biggest problem with symmetric cryptography is key distribution. For Alice and Bob to communicate, they must share a secret key that no adversaries can get. However, Alice cannot send the key to Bob since it would be visible to adversaries. She cannot encrypt it because Alice and Bob do not share a key yet.

Key exchange using a trusted third party

For two parties to communicate using symmetric ciphers they need to share the same key. The ways of doing this are:

  1. Share the key via some trusted mechanism outside of the network, such are reading it over the phone or sending a flash drive via FedEx.

  2. Send the key using a public key algorithm (if we can trust the key!)

  3. Use a trusted third party.

We will first examine the use of a trusted third party. A trusted third party is a trusted system that has everyone’s key. Hence, only Alice and the trusted party (whom we will call Trent) have Alice’s secret key. Only Bob and Trent have Bob’s secret key.

The simplest way of using a trusted third party is to ask it to come up with a session key and send it to the parties that wish to communicate. For example, Alice sends a message to Trent requesting a session key to communicate with Bob. This message is encrypted with Alice’s secret key so that Trent knows the message could have only come from Alice.

Trent generates a random session key and encrypts it with Alice’s secret key. He also encrypts the same key with Bob’s secret key. Alice gets both keys and passes the one encrypted for Bob to Bob. Now Alice and Bob have a session key that was encrypted with each of their secret keys and they can communicate by encrypting messages with that session key.

This simple scheme is vulnerable to replay attacks. An eavesdropper, Eve, can record messages from Alice to Bob and replay them at a later time. Eve might not be able to decode the messages but she can confuse Bob by sending him seemingly valid encrypted messages.

The second problem is that Alice sends Trent an encrypted session key but Trent has no idea that Alice is requesting to communicate with him. While Trent authenticated Alice (simply by being able to decrypt her request) and authorized her to talk with Bob (by generating the session key), that information has not been conveyed to Bob.

Needham-Schroeder: nonces

The Needham-Schroeder protocol improves the basic key exchange protocol by adding nonces to messages. A nonce is simply a random string – a random bunch of bits. Alice sends a request to Trent, asking to talk to Bob. This time, it doesn’t have to even be encrypted. As part of the request she sends a nonce.

Trent responds with a message that contains:

  • Alice’s ID
  • Bob’s ID
  • the nonce
  • the session key
  • a ticket: a message encrypted for Bob containing Alice’s ID and the same session key

This entire message from Trent is encrypted with Alice’s secret key. Alice can validate that the message is a response to her message because:

  • It is encrypted for her: nobody but Alice and Trent has Alice’s secret key.
  • It contains the same nonce as in her request, so it is not a replay of some earlier message, which would have had a different randomly-generated nonce.

Alice sends the ticket (the message encrypted with Bob’s key) to Bob. He can decrypt it and knows:

  • The message must have been generated by Trent since only Trent and Bob know Bob’s key and and thus could construct a meaningful message encrypted with Bob’s key.
  • He will be communicating with Alice because Trent placed Alice’s ID in that ticket.
  • The session key since Trent placed that in the ticket as well. Alice has this too.

Bob can now communicate with Alice but he will first authenticate Alice to be sure that he’s really communicating with her. He’ll believe it’s Alice if she can prove that she has the session key. To do this, Bob creates another nonce, encrypts it with the session key, and sends it to Alice. Alice decrypts the message, subtracts one from the nonce, encrypts the result, and sends it back to Bob. She just demonstrated that she could decrypt a message using the session key and return back a known modification of the message. Needham-Schroeder is a combined authentication and key exchange protocol.

Denning-Sacco modification: timestamps to avoid key replay

One flaw in the Needham-Schroeder algorithm is when Alice sends the ticket to Bob. The ticket is encrypted with Bob’s secret key and contains Alice’s ID as well as the session key. If an attacker happened to record an old conversation session and was able to decrypt the session key, she can replay the transmission of an old ticket to Bob. Bob won’t know that he received that same session key in the past. He will proceed to validate “Alice” by asking her to prove that she indeed knows the session key. In this case, Eve, our eavesdropper, does know it; that’s why she sent the ticket to Bob. Bob completes the authentication and thinks he is talking with Alice when in reality he is talking to Eve.

A fix for this was proposed by Denning & Sacco: add a timestamp to the ticket. When Trent creates the ticket that Alice will give to Bob, it is a message encrypted for Bob and contains Alice’s ID, the session key, and a timestamp.

When Bob receives a ticket, he checks the timestamp. If it is older than some recent time (e.g., a few seconds), Bob will simply discard the ticket, assuming that he is getting a replay attack.

Otway-Rees protocol: session IDs instead of timestamps

A problem with timestamps is that their use relies on all entities having synchronized clocks. If Bob’s clock is significantly off from Trent’s, he may falsely accept or falsely reject a ticket that Alice presents to him. Time synchronization becomes an attack vector for this protocol. If an attacker can change Bob’s concept of time, she may be able to convince Bob to accept an older ticket. To do this, she can create fake NTP (network time protocol) responses to force Bob’s clock to synchronize to a different value or, if Bob is paranoid and uses a GPS receiver to synchronize time, create fake GPS signals.

A way to avoid the replay of the ticket without using timestamps is to add a session ID to each message. The rest of the Otway-Rees protocol differs a bit from Needham-Schroeder but is conceptually very similar.

  1. Alice sends a message to Bob that contains:
    • A session ID
    • Aoth of their IDs
    • A message encrypted with Alice’s secret key. This encrypted message contains Alice and Bob’s IDs as well as the session ID.
  2. Bob sends Trent a request to communicate with Alice, containing:
    • Alice’s message
    • A message encrypted with Bob’s secret key that also contains the session ID.
  3. Trent now knows that Alice wants to talk to Bob since the session ID is inside her encrypted message and that Bob agrees to talk to Alice since that same session ID is inside his encrypted message.

  4. Trent creates a random session key encrypted for Bob and the same key encrypted for Alice and sends both of those to Bob, along with the session key.

The protocol also incorporates nonces to ensure that there is no replay attack on Trent’s response even if an attacker sends a message to Bob with a new session ID and old encrypted session keys (that were cracked by the attacker).

Kerberos

Kerberos is a trusted third party authentication, authorization, and key exchange protocol using symmetric cryptography and based closely on the Needham-Schroeder protocol with the Denning-Sacco modification (the use of timestamps).

When Alice wands to talk with Bob (they can be users and services), she first needs to ask Kerberos. If access is authorized, Kerberos will send her two messages. One is encrypted with Alice’s secret key and contains the session key for her communication with Bob. The other message is encrypted with Bob’s secret key. Alice cannot read or decode this second message. It s a ticket. It contains the same session key that Alice received but is encrypted for Bob. Alice will send that to Bob. When Bob decrypts it, he knows that the message must have been generated by an entity that knows its secret key: Kerberos. Now that Alice and Bob both have the session key, they can communicate securely by encrypting all traffic with that session key.

To avoid replay attacks, Kerberos places a timestamp in Alice’s response and in the ticket. For Alice to authenticate herself to Bob, she needs to prove that she was able to extract the session key from the encrypted message Kerberos sent her. She proves this by generating a new timestamp, encrypting it with the session key, and sending it to Bob. Bob now needs to prove to Alice that he can decode messages encrypted with the session key. He takes Alice’s timestamp, adds one (just to permute the value), and sends it back to Alice, encrypted with their session key.

Since your secret key is needed to decrypt every service request you make of Kerberos, you’ll end up typing your password each time you want to access a service. Storing the key in a file to cache it is not a good idea. Kerberos handles this by splitting itself into two components that run the same protocol: the authentication service (AS) and the ticket granting service (TGS). The authentication service handles the initial user request and provides a session key to access the TGS. This session key can be cached for the user’s login session and allows the user to send requests to the TGS without re-entering a password. The TGS is the part of Kerberos that handles requests for services. It also returns two messages to the user: a different session key for the desired service and a ticket that must be provided to that service.

Key exchange using public key cryptography

With public key cryptography, there generally isn’t a need for key exchange. As long as both sides can get each other’s public keys from a trusted source, they can encrypt messages using those keys. However, we rarely use public key cryptography for large messages. It can, however, be used to transmit a session key. This use of public key cryptography to transmit a session key that will be used to apply symmetric cryptography to messages is called hybrid cryptography. For Alice to send a key to Bob:

  1. Alice generates a random session key.
  2. She encrypts it with Bob’s public key & sends it to Bob.
  3. Bob decrypts the message using his private key and now has the session key.

Bob is the only one who has Bob’s private key to be able to decrypt that message and extract the session key. A problem with this is that anybody can do this. Charles can generate a random session key, encrypt it with Bob’s public key, and send it to Bob. For Bob to be convinced that it came from Alice, she can encrypt it with her private key (this is effectively signing the message).

  1. Alice generates a random session key.
  2. She signs it by encrypting the key with her private key.
  3. She encrypts the result with Bob’s public key & sends it to Bob.
  4. Bob decrypts the message using his private key.
  5. Bob decrypts the resulting message with Alice’s public key and gets the session key.

If anybody other than Alice created the message, the result that Bob gets by decrypting it with Alice’s public key will not result in a valid key for anyone. We can enhance the protocol by using a standalone signature (encrypted hash) so Bob can identify a valid key from a bogus one.

User Authentication

Authentication factors

The three factors of authentication are:

  1. something you have (such as a key or a card),
  2. something you know (such as a password or PIN),
  3. and something you are (biometrics).

Combining these into a multi-factor authentication scheme can increase security against the chance that any one of the factors is compromised. Multi-factor authentication must use two or more of these factors. Using two passwords, for example, is not sufficient and does not qualify as multi-factor.

Password Authentication Protocol

The classic authentication method is the use of reusable passwords. This is known as the password authentication protocol, or PAP. The system asks you to identify yourself (login name) and then enter a password. If the password matches that which is associated with the login name on the system then you’re authenticated.

Password guessing defenses

To avoid having an adversary carry out a password guessing attack, we need to make it not feasible to try a large number of passwords. A common approach is to rate-limit guesses. When the system detects an incorrect password, it will wait several seconds before allowing the user to try again. Linux, for example, waits about three seconds. After five bad guesses, it terminates and restarts the login process.

Another approach is to completely disallow password guessing after a certain number of failed attempts by locking the account. This is common for some web-based services, such as banks. However, the system has now been made vulnerable to a denial-of-service attack. An attacker may not be able to take your money but may inconvenience you by disallowing you to access it as well.

Hashed passwords

One problem with the password authentication protocol is that if someone gets hold of the password file on the system, then they have all the passwords. The common way to thwart this is to store hashes of passwords instead of the passwords themselves. This takes advantage of the one-way property of the hash: anyone who sees the hash still has no way of computing your password.

To authenticate a user, the system simply checks if hash(password) = stored_hashed_password. If someone got hold of the password file, they’re still stuck since they won’t be able to reconstruct the original password from the hash. They’ll have to resort to an exhaustive search (also known as a brute-force search) to search for a password that hashes to the value in the file. The hashed file should still be protected from read access by normal users to keep them from performing an exhaustive search.

A dictionary attack is an optimization of the search that tests common passwords, including dictionary words, known common passwords, and common letter-number substitutions rather than every possible combination of characters. Moreover, an intruder does not need to perform such search on each hashed password to find the password. Instead, the results of a dictionary search can be stored in a file and later searched to find a corresponding hash in a password file. These are called precomputed hashes. To guard against this, a password is concatenated with a bunch of extra random characters, called salt. These characters make the password substantially longer and would make a table of precomputed hashes insanely huge and hence not practical. Such a table would need to go far beyond a dictionary list and create hashes of all possible - and long - passwords. The salt is not a secret – it is stored in plaintext in the password file in order to validate a user’s password. Its only function is to make using precomputed hashes impractical and ensure that even identical passwords do not generate the same hashed results. An intruder would have to select one specific hashed password and do a brute-force or dictionary attack on just that password, adding salt to each guess prior to hashing it.

Password recovery options

Passwords are bad. They are not incredibly secure. English text has a low entropy (approximately 1.2–1.5 bits per character) and are often easy to guess. Password files from some high-profile sites have been obtained to validate just how bad a lot of people are at picking passwords. Over 90% of all user passwords sampled are on a list of the top 1,000 passwords. The most common password is password. People also tend to reuse passwords. If an attacker can get passwords from one place, there is a good chance that many of those passwords will work with other services.

Despite many people picking bad passwords, people often forget them, especially when they are trying to be good and use different passwords for different accounts. There are several common ways of handling forgotten passwords, none of them great:

Email them:
This used to be a common solution and is somewhat dying off. It requires that the server is able to get the password (it is not stored as a hash). It exposes the risk that anyone who might see your email will see your password.
Reset them:
This is more common but requires authenticating the requestor to avoid a denial of service attack. The common thing to do is to send a password reset link to an email address that was entered when the account was created. We again have the problem that if someone has access to your mail, they will have access to the password reset link and will be able to create a new password for your account. In both these cases, we have the problem that users may no longer have the same email address. Think of the people who switched from Comcast to get Verizon FiOS and switched their comcast.net addresses to verizon.net (note: avoid using email addresses tied to services or locations that you might change).
Provide hints:
This is common for system logins (e.g. macOS and Windows). However, a good hint may weaken the password or may not help the user.
Ask questions:
It is common for sites to ask questions (“what was your favorite pet’s name?”, “what street did you live on when you were eight years old?”). The answers to many of these questions can often be found through some searching or via social engineering. A more clever thing is to have unpredictable answers (“what was your favorite pet’s name?” “Osnu7$Qbv999”) but that requires storing answers somewhere.
Rely on users to write them down:
This is fine as long as the thread model is electronic-only and you don’t worry about someone physically searching for your passwords.

One-time Passwords

The other problem with reusable passwords is that if a network is insecure, an eavesdropper may sniff the password from the network. A potential intruder may also simply observe the user typing a password. To thwart this, we can turn to one-time passwords. If someone sees you type a password or gets it from the network stream, it won’t matter because that password will be useless for future logins.

There are three forms of one-time passwords:

  1. Sequence-based. Each password is a function of the previous password. S/Key is an example of this.

  2. Challenge-based. A password is a function of a challenge provided by the server. CHAP is an example of this.

  3. Time-based. Each password is a function of the time. TOTP and RSA’s SecurID are example of this.

Sequence-based: S/Key

S/Key authentication allows the use of one-time passwords by generating a list via one-way functions. The list is created such that password n is generated as f(password[n–1]), where f is a one-way function. The list of passwords is used backwards. Given a password password[p], it is impossible for an observer to compute the next valid password because a one-way function f makes it improbably difficult to compute the inverse function, f–1(password[p]), to get the next valid password, password[p–1].

Challenge-based: CHAP

The Challenge-Handshake Authentication Protocol (CHAP) is an authentication protocol that allows a server to authenticate a user without sending a password over the network.

Both the client and server share a secret (essentially a password). A server creates a random bunch of bits (called a nonce) and sends it to the client (user) that wants to authenticate. This is the challenge.

The client identifies itself and sends a response that is the hash of the shared secret combined with the challenge. The server has the same data and can generate its own hash of the same challenge and secret. If the hash matches the one received from the client, the server is convinced that the client knows the shared secret and is therefore legitimate.

An intruder that sees this hash cannot extract the original data. An intruder that sees the challenge cannot create a suitable hashed response without knowing the secret. Note that this technique requires passwords to be accessible at the server and the security rests on the password file remaining secure.

Time-based: TOTP and SecurID®

With the Time-based One Time Password (TOTP) protocol, both sides share a secret key. To authenticate, a user runs the TOTP function to create a one-time password. The TOTP function is a hash:

password := hash(secret_key, time) % 10<sup>password_length</sup>

The resultant hash is taken modulo some number that determines the length of the password. A time window of 30 seconds is usually used to provide a reasonably coarse granularity of time that doesn’t put too much stress on the user or requirements for tight clock synchronization. The service, who also knows the secret key and time, can generate the same hash and hence validate the value presented by the user.

TOTP is often used as a second factor (proof that you have some device with the secret configured in it) in addition to a password. The protocol is widely supported by companies such as Amazon, Dropbox, Wordpress, Microsoft, and Google.

A closely related system is RSA’s SecureID is a two-factor authentication system that generates one-time passwords for response to a user login prompt. It relies on a user password (Personal ID Number, PIN) and a token device (an authenticator card or fob). The token generates a new number every 30 seconds. The number is a function of a seed that is unique for each card and the time of day.

To authenticate to a server, you send a concatenation of your PIN and the number from the token in lieu of a password. A legitimate remote system will have your PIN, the token seed, and the time of day and will be able to compute the same value to validate your password. An intruder would not know your PIN or the token’s seed and will never see it on the network.

Public key authentication

Public key authentication relies on the use of nonces, similar to the way they were used to authenticate users using the Needham-Schroeder protocol. A nonce is is generated on the fly and used to present to the other party as a challenge for them to prove that they are capable of encrypting something with a specific key that they possess. The use of a nonce is central to public key authentication.

If Alice wants to authenticate Bob, she needs to have Bob prove that he possesses his private key (private keys are never shared). To do this, Alice generates a nonce (a random bunch of bits) and sends it to Bob, asking him to encrypt it with his private key. If she can decrypt Bob’s response using Bob’s public key and sees the same nonce, she will be convinced that she is talking to Bob because nobody else will have Bob’s private key. Mutual authentication requires that each party authenticate itself to the other: Bob will also have to generate a nonce and ask Alice to encrypt it with her private key.

Man-in-the-middle attacks

Authentication protocols can be vulnerable to man-in-the-middle (MITM) attacks. In this attack, Alice thinks she is talking to Bob but is really talking to Mike (the man in the middle, an adversary). Mike, in turn talks to Bob. Any message that Alice sends gets forwarded by Mike to Bob. Mike forwards any response from Bob gets back to Alice. This way, Mike allows Alice and Bob to carry out their authentication protocol. Once Bob is convinced he is talking with Alice, Mike can drop Alice and communicate with Bob directly, posing as Alice … or stay around and read their messages, possibly changing them as he sees fit.

The protocols that are immune to this are those where Alice and Bob establish an encrypted channel using trusted keys. For example, with Kerberos, both Alice and Bob get a session key that is encrypted only for each of them. Mike cannot find it even if he intercepts their communications.

With public key cryptography, Mike can take over after Bob is convinced he is talking with Alice. To avoid a man-in-the-middle attack Alice will have to send Bob a session key. If she uses public key cryptography to do the key exchange, as long as the message from Alice is signed, Mike will not be able to decrypt the session key or forge a new one.

Biometric authentication

Biometric authentication is the process of identifying a person based on their physical or behavioral characteristics as opposed to their ability to remember a password or their possession of some device. It is the third of the three factors of authentication: something you know, something you have, and something you are.

It is also fundamentally different than the other two factors because it does not deal with data that lends itself to exact comparisons. For instance, sensing the same fingerprint several times is not likely to give you identical results each time. The orientation may differ, the pressure and angle of the finger may result in some parts of the fingerprint to appear in one sample but not the other, and dirt, oil, and humidity may alter the image. Biometric authentication relies on statistical pattern recognition: we establish thresholds to determine whether two patterns are close enough to accept as being the same.

A false accept rate (FAR) is when a pair of different biometric samples (e.g., fingerprints from two different people) is accepted as a match. A false reject rate (FRR) is when a pair of identical biometric samples is rejected as a match. Based on the properties of the biometric data, the sensor, the feature extraction algorithms, and the comparison algorithms, each biometric device has a characteristic ROC (Receiver Operating Characteristic) curve. The name derives from early work on RADAR and maps the false accept versus false reject rates for a given biometric authentication device. For password authentication, the “curve” would be a single point at the origin: no false accepts and no false rejects. For biometric authentication, which is based on thresholds that determine if the match is “close enough”, we have a curve.

At one end of the curve, we can have an incredibly low false accept rate (FAR). This is good as it means that we will not have false matches: the enemy stays out. However, it also means that the false reject rate (FRR) will be very high. If you think of a fingerprint biometric, the stringent comparison needed to yield a low FAR means that the algorithm will not be forgiving to a speck of dirt, light pressure, or a finger held at a different angle. We get high security at the expense of inconveniencing legitimate users, you may have to present their finger over and over again for sensing, hoping that it will eventually be accepted.

At the other end of the curve, we have a very low false reject rate (FRR). This is good since it provides convenience to legitimate users. Their biometric data will likely be accepted as legitimate and they will not have to deal with the frustration of re-sensing their biometric, hoping that their finger is clean, not too greasy, not too dry, and pressed at the right angle with the correct pressure. The trade-off is that it’s more likely that another person’s biometric data will be considered close enough as well and accepted as legitimate.

There are numerous biological components that can be measured. They include fingerprints, irises, blood vessels on the retina, hand geometry, facial geometry, facial thermographs, and many others. Data such as signatures and voice can also be used, but these often vary significantly with one’s state of mind (your voice changes if you’re tired, ill, or angry). They are behavioral systems rather than physiological systems, such as your iris patterns, length of your fingers, or fingerprints, and tend to have lower recognition rates. Other behavioral biometrics include keystroke dynamics, mouse use characteristics, gait analysis, and even cognitive tests.

Regardless of which biometric is used, the important thing to do in order to make it useful for authentication is to identify the elements that make it different. Most of us have swirls on our fingers. What makes fingerprints different from finger to finger are the various variations in those swirls: ridge endings, bifurcations, enclosures, and other elements beyond that of a gently sloping curve. These features are called minutia. The presence of minutia, their relative distances from each other an their relative positions can allow us to express the unique aspect of a fingerprint as a relatively compact stream of bits rather than a bitmap.

Two important elements of biometrics are robustness and distinctiveness. Robustness means that the biometric data will not change much over time. Your fingerprints will look mostly the same next year and the year after. Your fingers might grow fatter (or thinner) over the years and at some point in the future, you might need to re-register your hand geometry data.

Distinctiveness relates to the differences in the biometric pattern among the population. Distinctiveness is also affected by the precision of a sensor. A finger length sensor will not measure your finger length to the nanometer, so there will be quantized values in the measured data. Moreover, the measurements will need to account for normal hand swelling and shrinking based on temperature and humidity, making the data even less precise. Accounting for these factors, approximately one in a hundred people may have hand measurements similar to yours. A fingerprint sensor may typically detect 40–60 distinct features that can be used for comparing with other sensed fingerprints. An iris scan, on the other hand, will often capture over 250 distinct features, making it far more distinctive and more likely to identify a unique individual.

Some sensed data is difficult to normalize. Here, normalization refers to the ability to align different sensed data to some common orientation. For instance, identical fingers might be presented at different angles to the sensors. The comparison algorithm will have to account for possible rotation when comparing the two patterns. The inability to normalize data makes it difficult to perform efficient searches. There is no good way to search for a specific fingerprint short of performing a comparison against each stored pattern. Data such as iris scans lends itself to normalization, making it easier to find potentially matching patterns without going through an exhaustive search.

In general, the difficulty of normalization and the fact that no two measurements are ever likely to be the same makes biometric data not a good choice for identification. It is extremely difficult, for example, to construct a system that will store hundreds of thousands of fingerprints and allow the user to identify and authenticate themselves by presenting their finger. Such a system will require an exhaustive search through the stored data and each comparison will itself be time consuming as it will not be a simple bit-by-bit match test. Secondly, fingerprint data is not distinct enough for a population of that size. A more realistic system will use biometrics for verification and have users identify themselves through some other means (e.g., type their login name) and then present their biometric data. In this case, the software will only have to compare the pattern associated with that user.

The biometric authentication process comprises several steps:

  1. Enrollment. Before any authentication can be performed, the system needs to have stored biometric data of the user that it can use for comparison. The user will have to present the data to the sensor, distinctive features need to be extracted, and the resulting pattern stored. The system may also validate if the sensed data is of sufficiently high quality or ask the user to repeat the process several times to ensure consistency in the data.

  2. Sensing. The biological component needs to be measured by presenting it to a sensor, a dedicated piece of hardware that can capture the data (e.g., a camera for iris recognition, a capacitive fingerprint sensor). The sensor captures the raw data (e.g., an image).

  3. Feature extraction. This is a signal processing phase where the interesting and distinctive components are extracted from the raw sensed data to create a biometric pattern that can be used for matching. This process involves removing signal noise, discarding sensed data that is not distinctive or not useful for comparisons, and determining whether the resulting values are of sufficiently good quality that it makes sense to use them for comparison. A barely-sensed fingerprint, for instance, may not present enough minutia to be considered useful.

  4. Pattern matching. The extracted sample is now compared to the stored sample that was obtained during the enrollment phase. Features that match closely will have small distances. Given variations in measurements, it is unlikely that the distance will be zero, which would indicate a perfect match.

  5. Decision. The “distance” between the sensed and stored samples is now evaluated to decide if the match is close enough. The decision determination decides whether the system favors more false rejects or more false accepts.

Security implications

There are several security issues that relate to biometric authentication.

Sensing
Unlike passwords or encryption keys, biometric systems require sensors to gather the data. The sensor, its connectors, the software that processes sensed data, and the entire software stack around it (operating system, firmware, libraries) must all be trusted and tamper-proof.
Secure communication and storage
The communication path after the data is captured and sensed must also be secure so that attackers will have no ability to replace a stored biometric pattern with one of their own.
Liveness
Much biometric data can be forged. Gummy fingerprints can copy real fingerprints, pictures of faces or eyes can fool cameras into believing they are looking at a real person, and recordings can be used for voice-based authentication systems.
Thresholds
Since biometric data relies on “close-enough” matches, you can never be sure of a certain match. You will need to determine what threshold is good enough and hope that you do not annoy legitimate users too much or make it too easy for the enemy to get authenticated.
Lack of compartmentalization
You have a finite set of biological characteristics to present. Fingerprints and iris scans are the most popular biometric sources. Unlike passwords, where you can have distinct passwords for each service, you cannot have this with biometric data.
Theft of biometric data
If someone steals your password, you can create a new one. If someone steals your fingerprint, you have nine fingerprints left and then none. If someone gets a picture of your iris, you have one more left. Once biometric data is compromised, it remains compromised.

Detecting humans

CAPTCHA (Completely Automated Public Turning test to tell Computers and Humans Apart) is not a technique to authenticate users but rather a technique to identify whether a system is interacting with a human being or with automated software. The idea behind it is that humans can recognize highly distorted characters far better than character recognition software can.

CAPTCHA presents an image containing a string of distorted text and asks the user to identify the text. As optical character recognition (OCR) technology improved, this text had to be ever more distorted and often reached a point where legitimate users struggled to decode it. CAPTCHAs were designed to thwart scripts that would, for example, sign up for thousands of logins on a service or buy all tickets to an event. CAPTCHAs do this by having a human solve the CAPTCHA before proceeding.

This was not always successful. CAPTCHAs were susceptible to a form of a man-in-the-middle attack where the distorted image is presented to low-cost (or free) humans whose job is to decipher CAPTCHAs. These are called CAPTCHA farms. Ever-improving OCR technology also made text-based CAPTCHAs susceptible to attack. By 2014, Google found that they could use AI techniques to crack CAPTCHAs with 99.8% accuracy.

An alternative to text-based CAPTCHAs are CAPTCHAs that involve image recognition, such as “select all images that have mountains in them” or "select all squares in an image that have street signs’. A recent variation of CAPTCHA is Google’s No CAPTCHA reCAPTCHA. This simply asks users to check a box stating that I’m not a robot. The JavaScript behind the scenes, however, does several things:

  • It contacts the Google server to perform an “advanced risk analysis”. What this does is not defined but the act of contacting the server causes the web browser to send Google cookies to the server. If you’re logged in to a Google account, your identity is now known to the server via a cookie and the server can look at your past history to determine if you are a threat.

  • By contacting the Google server, the server can also check where the request came from and compare it with its list of known malicious IP addresses known to host bots

  • The JavaScript code monitors the user’s engagement with the CAPTCHA, measuring mouse movements, scroll bar movement, acceleration, and the precise location of clicks. If everything is too perfect then the software will assume it is not dealing with a human being.

The very latest variation of this system is the invisible reCAPTCHA. The user doesn’t even see the checkbox: it is oriented tens of thousands of pixels above the origin, so the JavaScript code is run but the reCAPTCHA frame is out of view. If the server-based risk analysis does not get sufficient information from the Google cookies then it relocates the reCAPTCHA frame back down to a visible part of the screen.

Finally, if the risk analysis part of the system fails, the software presents a CAPTCHA (recognize text on an image) or, for mobile users, a quiz to find matching images.

Last modified November 25, 2020.
recycled pixels