Best Practices for Using Anonymized/Pseudonymized Data with Kaunt
To ensure optimal performance when using Kaunt, it's important to understand how anonymized and pseudonymized data impacts our AI models. While Kaunt is designed to handle a variety of data setups, working with raw data provides the best results. In this guide, we cover some best practices when using anonymized or pseudonymized data.
Use Raw Data for Best Results
Kaunt's AI models perform best when trained on raw, unprocessed data. This allows our system to detect nuanced patterns and relationships in your invoices, enabling more accurate predictions and coding suggestions. We recommend using raw data whenever possible to maintain the highest level of accuracy.
Recommendation: Use raw, unprocessed data for the most accurate results. Altering data can reduce the model's ability to learn from your historical postings and invoices.
Limitations with Pseudonymized Data
While we understand the need for data anonymization or pseudonymization for privacy reasons, it's important to note that pseudonymized data can limit Kaunt's capabilities. When key fields, such as vendor names or product descriptions, are altered, Kaunt's AI may not fully understand the context and relationships between the data points, reducing accuracy.
- Support for pseudonymized data: We cannot guarantee the same level of accuracy or support when the data has been pseudonymized, as the recognizable patterns are often essential for our models to function optimally.
Tip: If pseudonymization is necessary, consider retaining consistent mappings for critical fields, such as serial numbers, to minimize disruption to the AI model's understanding.
Word-Level Mapping
When anonymizing data, it's crucial to map identifiers at the word-level. If entire sentences are mapped, it will cause major degrades in prediction quality. This means you can safely pseudonymize serial numbers, product codes, or customer identifiers, but do not modify numeric values like amounts, totals, or VAT values.
- What you can map: Serial numbers, product codes, customer IDs
- What should remain unchanged: Invoice amounts, VAT codes, totals, dates
Numeric fields are critical for Kaunt's algorithms to process invoices correctly. Altering these values can lead to significant errors in the predictions and coding recommendations.
Processing Dimensions and Dimension Values
Kaunt can also process dimensions and dimension values, which represent additional metadata associated with the invoice (e.g., G/L accounts, cost centers, departments). These dimensions and values can be pseudonymized if necessary, but it's important to preserve their structure and consistency to ensure that the AI can correctly map and predict them.
- Dimension and dimension values mapping: You can pseudonymize dimensions such as G/L accounts or departments, but avoid altering their relationships or structure in the data.
- Preserve valid combinations: Ensure that the combinations of dimension values (e.g., department with cost center) remain intact to avoid disrupting how the AI processes the data.
By following these best practices, you'll help ensure that Kaunt's AI models can deliver accurate and reliable results, while also maintaining compliance with data privacy regulations. If you have specific requirements or questions about using anonymized data, please reach out to our support team for guidance.