Detecting similar data is crucial for optimizing file storage and transmission in HTTP protocols and Content Delivery Networks. Traditional MinHash methods encounter significant efficiency challenges due to their reliance on K-shingle structures, resulting in high computational costs and storage requirements. Additionally, these methods expose privacy risks in cloud environments, where sensitive information can be inferred from MinHash signatures. To address both efficiency and security concerns, we propose Horse-MinHash, which integrates a fast, content-defined feature extraction scheme with ...