Who does Zalo PC backup encryption protect?

I used to think data export had a fairly direct job. The user proves ownership of an account, the service packages the data, and the result arrives in a readable format. Zalo PC takes a longer route. Its file ends in .zl.zip, but it is not an ordinary ZIP. The content is an encrypted TAR archive.

Encrypting data is not a bad idea. The curious part is that the user is exporting their own data on an already signed-in machine, yet reading it still requires finding the key that the application is holding. It feels like putting a safe inside the house, hanging the key beside the door, and making the homeowner the person most inconvenienced by it.

Every observation in this article was made with my own data and a Zalo session I was authorized to use. This is an analysis of a technical design, not a guide to accessing another person’s account or data.

A data export that has not quite been handed to the user

Meta handles this in a more legible way. Facebook’s official documentation lets users choose HTML or JSON. The HTML package arrives as a ZIP that can be extracted and opened through an index file, while JSON is intended for machine processing or transfer to another service. Password and additional verification checks happen before the download, so access control is enforced before the data leaves the service.

The point is not that Meta is good and Zalo is bad. The systems have different histories and threat models. For a feature called data export, however, HTML, JSON, or an ordinary ZIP makes one thing clear. Once authorization succeeds, the data belongs in the user’s hands. An application-specific ciphertext sends a stranger message. You may take your data home, but you may not necessarily read it with a tool of your choosing.

The ZIP extension is only the sign on the door

The .zl.zip file does not carry a normal ZIP header. Once correctly decrypted, the plaintext appears as TAR. The data flow can be summarized like this.

Diagram showing TAR plaintext encrypted into .zl.zip ciphertext, then decrypted with a uin-derived key and IV back into TAR plaintext. — A compact view of a `.zl.zip` file after looking past the filename. Download diagram

The observed key and IV derivation is short.

key = SHA256(UTF8(uin))
iv  = UTF8("zie" + uin[0:13])

plaintext = AES_256_CBC_DECRYPT(ciphertext, key, iv)

SHA-256 produces 32 bytes, exactly the size of an AES-256 key. The string zie plus the first 13 characters of uin forms a 16-byte IV. The formula cannot remain a durable secret because the application must contain or perform it. The design instead relies on uin.

This is where the encryption develops a touch of dry comedy. uin is not a user-selected password, it does not pass through a salted KDF such as PBKDF2, scrypt, or Argon2, and the signed-in application still needs to know it. A key derived from an identifier can make a detached file less immediately readable, but it should not be confused with a high-entropy secret.

There is no need to defeat AES when key material remains in RAM

The interesting part is not breaking AES. AES is doing its job. The issue is the key lifecycle in a desktop application. While Zalo is running, a legitimate process must retain enough state to work with the current account. Another process in the same user context can use read-only Windows APIs such as VirtualQueryEx and ReadProcessMemory to inspect suitable memory regions.

Potential uin values can be found as hexadecimal strings in ASCII and UTF-16LE. For each candidate, derive the key and IV, then decrypt only the first block of the file. TAR has a 512-byte header and a checksum field at a fixed position, making it a convenient known-plaintext check. A wrong candidate is extraordinarily unlikely to produce both a valid checksum and a readable first entry name.

for region in readable_memory_regions:
    candidates = find_uin_like_strings(region)

    for uin in candidates:
        key = SHA256(UTF8(uin))
        iv = UTF8("zie" + uin[:13])
        header = decrypt_first_block(backup, key, iv)

        if looks_like_tar_header(header):
            accept(uin)

No one is breaking AES here. The key is found where the application is forced to use it, and TAR confirms that the right key was picked up.

This does not prove that encryption is useless in every situation. If someone obtains only the backup file without the machine, session, or related account data, ciphertext still creates an extra barrier. If an attacker can already execute code under the same Windows account and read process memory, however, the story has moved to a different layer. Backup encryption cannot rescue an endpoint that has already been compromised.

If the goal is stopping hackers, the boundary is being drawn rather late

The most charitable hypothesis is that Zalo encrypts the backup to protect it if the file is stolen. Within that narrow threat model, the layer helps. A person holding only the file cannot open it as an ordinary ZIP.

The word “hacker” is usually used much more broadly. If the machine already runs malware, an attacker may read files, observe the signed-in session, steal tokens, capture the screen, inspect processes, or simply wait for the application to decrypt the data. Encrypting an export with key material recoverable from the live session does not make a compromised computer safe again.

In short, this layer may protect against someone who finds one file. It does not protect against someone already inside the room where the file and its key coexist.

If the goal is integrity, CBC has been assigned the wrong job

AES-256-CBC provides confidentiality, but CBC alone provides neither integrity nor authenticity. In the observed format, the ciphertext is not accompanied by a MAC or authentication tag. There is no HMAC and no AEAD construction such as AES-GCM to prove that the data has not been changed.

The key comes from one direct SHA-256 operation, without a salt or a key-stretching mechanism.
The IV is deterministically derived from uin, so backups for one account may reuse both key and IV.
CBC is malleable. Ciphertext can be modified to influence plaintext, even though TAR structure makes practical exploitation harder.
A TAR checksum detects structural errors and accidental corruption. It is not a MAC, does not authenticate the creator, and does not resist intentional forgery.

If the actual requirement is to verify a file before import, clearer mechanisms include a signed manifest, a MAC over the archive, hashes for individual components, a versioned schema, and strict validation rules. Encryption may accompany those controls, but it cannot impersonate them.

If the goal is preventing edited data from being reimported, the case is still weak

An obscure format can discourage casual editing. It does not create a trust boundary. When a client application contains enough logic to read a file, that logic can be observed and reproduced. The data can then be decrypted, changed, packaged, and encrypted again.

Preventing modified imports requires the application to verify something the user cannot reproduce. A digital signature made with a service-side private key is one example. Schema, ID, relationship, timestamp, and server-state validation can also reject fabricated records. Wrapping an archive with a key derived from local data is closer to applying a “do not open” sticker while leaving the tape beside it.

So what is Zalo actually trying to achieve?

I do not have Zalo’s internal design documents, so I cannot claim to know its intent. The only firm statements are the observed behaviors. The export does not open as ZIP, the decrypted data is TAR, the key is derived from uin, and an active Zalo session retains enough traces to recover that value.

Several explanations remain plausible.

Zalo may want to protect a backup after it has been separated from the machine that created it.
It may want to discourage users from editing data with ordinary tools.
It may treat the format as a private implementation detail that can change without supporting third-party tooling.
The encryption may be a legacy decision that remains because existing systems now depend on it.

All four are hypotheses, not conclusions about Zalo’s motive. What can be concluded is that the current design makes legitimate users work harder to read their own data while offering limited value against a compromised endpoint. That is a visible cost attached to a benefit that has not been made equally clear.

After decryption, the archive is still untrusted input

Successful decryption does not make unconditional extraction safe. TAR can contain paths, symbolic links, hard links, and several other entry types. Rejecting names that begin with /, contain a Windows drive, or include .. is still insufficient if a link can point outside the destination directory.

A safe extraction flow must understand entry types, validate link targets, resolve paths, and verify containment before writing. If validation and extraction happen in separate passes, the source must remain stable or be checked by hash to avoid a TOCTOU gap. File-count, total-size, directory-depth, and processing-time quotas are also needed to limit archive bombs and resource exhaustion.

This part is easily hidden by the phrase “AES-256.” The algorithm sounds reassuring, but the decrypted archive is still user-selected input and the parser is where that input reaches the file system. Cryptography does not patch path traversal, symlink handling, or resource limits.

A threat model shorter than most claims about encryption

Situation	What the encryption helps with	What remains exposed
Only the backup file is stolen	The file does not open as ZIP and requires the correct key input	Security depends on the secrecy and real entropy of `uin`
The signed-in machine is compromised	Almost nothing against a lost endpoint	A same-user process may inspect RAM, files, and session state
The backup is modified	CBC hides content but does not prove integrity	No MAC or authentication tag
A malicious backup is extracted	Encryption does not make the TAR parser safer	Path traversal, links, TOCTOU, and resource exhaustion

What remained after this investigation

The reverse engineering itself did not require magic. The extension says ZIP, but the header disagrees. Valid plaintext has TAR structure. The key and IV follow from uin. Data leading to uin appears in the signed-in renderer. The TAR header becomes the confirmation test. Each step is a small inference that joins the next one.

The original design decision remains harder to explain. The user has authenticated and deliberately requested an export, yet the result asks the application to act as gatekeeper once again. If the goal is protecting a detached copy, the layer offers some value. If the goal is resisting a compromised machine, forgery, or edited reimports, it is using the wrong control or missing more important ones.

Perhaps this is the kind of security that keeps everyone busy. The application encrypts, the user searches for a way to read their own data, and the already-compromised machine waits politely outside the debate.

References

- NhanAZ - 30.06.2026