Handbook on Data Protection and Privacy for Developers of ...

108
Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: July 2021 Practical Guidelines for Responsible Development of AI

Transcript of Handbook on Data Protection and Privacy for Developers of ...

Handbook onData Protection and Privacyfor Developers ofArtificial Intelligence (AI) in India:

July 2021

Practical Guidelines forResponsible Development of AI

Published by:Deutsche Gesellschaft fürInternationale Zusammenarbeit (GIZ) GmbH

Registered officesBonn and Eschborn

FAIR Forward: Artificial Intelligence for AllA2/18, Safdarjung Enclave, New Delhi -110029, IndiaT: +91 11 4949 5353F: + 91 11 4949 5391

E: [email protected]: www.giz.de

Responsible: Mr. Gaurav SharmaAdvisor – Artificial IntelligenceFAIR Forward: Artificial Intelligence for AllGIZ [email protected]

Authors:KOAN Advisory (Varun Ramdas, Priyesh Mishra, Aditi Chaturvedi) Digital India Foundation (Nipun Jain)

Content Review:Gaurav Sharma, GIZ India

Editor:Divya Joy

Design and Layout: Caps & Shells Creatives Pvt Ltd.

On behalf ofGerman Federal Ministry for Economic Cooperation and Development (BMZ)

GIZ India is responsible for the content of this publication.

Disclaimer: The data in the publication has been collected, analysed and compiled with due care; and has been prepared in good faith based on information available at the date of publication without any independent verification. However, Deutsche Gesellschaft fürIn-ternationale Zusammenarbeit (GIZ) GmbH does not guarantee the accuracy, reliability, completeness or currency of the information in this publication. GIZ shall not be held liable for any loss, damage, cost or expense incurred or arising by reason of any person using or relying on information in this publication.

New Delhi, IndiaJuly 2021

Trust is the key to widespread adoption and success of Arti�cial Intelligence (AI). It is for this reason that AI products must adhere to the highest standards of ethics and prevailing regulations. �is data protection handbook shall act as a guiding document and help in the development of arti�cial intelligence technologies that are legally and socially acceptable.

�is handbook is for developers of AI who are already familiar and know Machine Learning (ML) processes such as early-stage start-ups. �e purpose of the handbook is to give developers a clear understanding about the essential ethical frameworks in AI and important issues related to protection of personal data and privacy. With the help of this handbook, developers will be able to follow the guidelines on ethics and data protection laws from the very inception of product/application design to its development. Keeping these parameters in mind from the beginning can save developers and start-ups from issues that may arise later on and can create complications that may even force abandoning and redesigning product/application. �e handbook encourages developers to take an interdisciplinary view andthink beyond the con�nes of algorithmic accuracy and focus on the social impact of the product. �e handbook does not provide technical solutions but gives developers ethical and legal objectives to pursue. Developers are free to think of innovative ways in which they can include these guidelines in their design.

What to expect from the handbookand prerequisites?

03 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Koan Advisory is a New Delhi-based public policy consultancy firm. It has a multi-disciplinary team of lawyers, economists, social scientists and communications professionals who work with clients in the public and private sector to help shape policy discourse in India. Koan Advisory's subject area expertise includes intellectual property and innovation, competition and market structures, trade and commerce, and governance of new and emerging technology.

Data Security Council of India (DSCI) is a not-for-profit, industry body on data protection in India, setup by NASSCOM®, committed towards making cyberspace safe, secure and trusted by establishing best practices, standards and initiatives in cyber security and privacy. DSCI works together with the Government and their agencies, law enforcement agencies, industry sectors including IT-BPM, BFSI, CII, Telecom, industry associations, data protection authorities and think tanks for public advocacy, thought leadership, capacity building and outreach initiatives. DSCI through its Centres of Excellence at National and State levels, works towards developing an ecosystem of Cyber Security technology development, product entrepreneurship, market adoption and promoting India as a hub for Cyber Security.

Digital India Foundation is a policy think-tank promoting Digital Inclusion, Cyber Security, Mobile Manufacturing, Domestic Consumption, Software Products and Smart Cities. Since its inception in 2015, it has worked with pioneers in industry, academia and government organizations on issues ranging from promoting a safer and inclusive product and policy environment for Indian ecosystem.

The German Development Cooperation initiative “FAIR Forward – Artificial Intelligence for All” strives for a more open, inclusive and sustainable approach to AI on an international level. To achieve this, we are working together with five partner countries: Ghana, Rwanda, South Africa, Uganda and India. Together, we pursue three main goals: a) Strengthen Local Technical Know-How on AI, b) Remove Entry Barriers to AI, c) Develop Policy Frameworks Ready for AI.

Organisations involved

04 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Artificial Intelligence (AI) has emerged as a foundational technology, powering NextGen solutions across a wide

cross-section of verticals and use cases. The innovation ecosystem building AI based products and solutions for

BFSI, healthcare, retail, cyber security to name a few, spans the Big Tech, start-ups, and services firms. With the

globalAI market size estimated to grow from USD 37.5 bn in 2019 to USD 97.9 bn by 2023, the growing demand

for talent pool and their skills upgrade is a priority for India’s technology industry.

Both governments and businesses are favoring AI not only for its innovation potential but also for revenue impact

and cost savings. As a result,it becomes increasingly relevant to mitigate the risks of AI solutions on users and

society at large. In the context of AI, cases of algorithmic bias or compromised safety come to the forefront with

the risk of undoing the potential benefits of leveraging this technology for good and creating wider

socio-economic impact. Awareness and capacity building of the developer community is essential for creating a

strong foundation for responsible data-centric innovation. This is an important cog in the overall schema for the

proliferation of AI technology in India and balancing its innovation potential with ethics and privacy.

It is imperative that risks of abuse of AI technology are not only addressed through inclusive policymaking but

also practitioner-centric initiatives to integrate best practices of ethics and privacy into the AI solutions

proactively. DSCI has always championed the practitioner-led approach as a means of making a tangible impact

in the policy discourse. Educating the developers and solution architects innovating on AI on all the right

practices to meet both user and regulator expectations will create a longer-term positive impact.

To facilitate this, the Data Security Council of India (DSCI) is delighted to collaborate with the German

Development Cooperation (GIZ), Digital India Foundation, and Koan Advisory Group to come out with this

handbook, to assist developers in building AI solutions that take into account ethical and privacy considerations.

The handbook draws from globally recognized ethical principles and the current and evolving regulatory

landscape for privacy in India. We are confident the handbook will help developers follow the guidelines on

ethics and data protection laws from the very inception, of product/application design to development. The

handbook encourages developers to take an interdisciplinary view, think beyond the confines of algorithmic

accuracy and give a thought to the social impact of the product and adequate attention to accountability and

transparency, right from the design stage. Several experts from industry and academia contributed with their rich

experiences and we thank them for making this handbook of immense value add to the AI ecosystem and

motivate the developer talent pool on a journey of discovering responsible AI innovation.

Rama Vedashree

CEO

Data Security Council of India

Foreword

05 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

The world is advancing towards an Artificial Intelligence (AI)-powered industrial revolution. However, the rapid

development and deployment of this emerging technology is accompanied by significant uncertainties that may

erode public trust in AI. For instance, tough questions about the liabilities of AI systems and the risks posed by

algorithmic biases, need answers.

Though such complex challenges are hard to solve from a rulemaking perspective alone, they can be overcome

if appropriate ethical considerations shape two key aspects of AI development: the use of data – the building

block of AI – and the deployment of algorithms that are trained on such data.

Software developers are the architects of AI systems and they must be the first to understand the ethical

considerations and emerging legal frameworks that govern them. An awareness among the developer community

is arguably more important than any other facet of AI adoption at scale. For example, greater sensitivity to AI

ethics and laws can help developers identify sensitive data and build safeguards to manage risks linked to it.

This handbook, co-developed by the Digital India Foundation and the Koan Advisory Group, with technical and

financial support from the German Development Cooperation (GIZ) and the Data Security Council of India (DSCI),

is meant to be an easy reference guide that helps developers put into practice ethical and legal requirements of

AI systems.

The handbook discusses key concepts like bias, privacy, data security, transparency and accountability. The best

practices, checklists and citations mentioned here are meant to facilitate a rigorous yet, pragmatic and

actionable understanding of the subject at hand.

The ethical principles mentioned in the handbook are in line with the principles proposed by the Organisation for

Economic Co-operation and Development and India’s NITI Aayog. The legal-regulatory requirements correspond to

the provisions under India’s Draft Data Protection Bill, 2019 and Information Technology Act, 2000.

A fearless embrace of AI requires supportive baseline conditions. This means that society at large must be

convinced of the symbiotic nature of technology, through participative and robust design. We have kept these

inclusive first-principles in mind to create this instructional guide. We thank all the developers and domain

experts who helped us shape this handbook and hope that its contents are widely discussed among relevant

communities.

Arvind Gupta

Head & Co-Founder

Digital India Foundation

Foreword

Vivan Sharan

Partner

Koan Advisory

06 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

07 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

The Handbook on Data Protection and Privacy for Developers of Artificial Intelligence in India: Practical Guidelines for

the Responsible Development of AI is a comprehensive document which details the rules and best practices on

the ethical use of AI. Multiple organizations collaborated with each other to develop a handbook that will work

as a guide for developers to develop such software tools that adheres to the ethical and moral principles of

privacy, dignity, fairness and equity in the use of AI by different entities.

It gives me immense pleasure to express my heartfelt gratitude to all the organisations and individuals who made

the writing of this handbook a learning experience. This document is the result of an intense deliberation of a

small but diverse group of stakeholders. These include (in alphabetic order); Amith Parameshwara, Leader -

Artificial Intelligence at Kimberly-Clark; Amit Shukla, Founder, EasyGov; Anand Naik, CEO & Co-Founder at

Sequretek; Atul Khatavkar, CEO HB Infotech; Bhavesh Manglani, Co-Founder at Delhivery; Professor Biplav

Srivastava, Computer Science and Engineering, AI Institute, University of South Carolina (USC); Dr Kalika Bali,

Principal Researcher at Microsoft Research India (MSR); Girish Nair, Founder, Curiosity Gym; Dr Neeta Verma,

Director General of National Informatics Centre; Saurabh Chandra, Co-Founder, Ati Motors; Prof Siddharth

Shekhar Singh, Associate Dean - Digital Transformation, e-Learning, and Marketing at Indian School of

Business; Shuchita Gupta, Co-Founder, Care4Parents; UmakantSoni, Co-founder & CEO AI and Robotics Technology

Park (ARTPARK); Professor Venkatesh Balasubramaniam, Department of Engineering Design, IIT Madras; Vibhore

Kumar, CEO & Co-founder at Unscrambl Inc.

We also extend our thanks to Philipp Olbrich, Gaurav Sharma & Siddharth Nandan from FAIR Forward: AI for All

project, GIZ; Anand Krishnan from Data Security Council of India; Aditi Chaturvedi, Priyesh Mishra, Varun Ramdas

and Vivan Sharan from Koan Advisory as well as Arvind Gupta and Nipun Jain from Digital India Foundation.

Without the guidance of these individuals and constant mentorship from Professor Biplav Srivastava, the

handbook wouldn’t have been in the form it is today.

Thank you!

Acknowledgement

Contents

10 How to read this Handbook?

10 Meaning - Brief explanation of principle and why it is important

11 Section I : Ethics in AI

12 Introduction

13 Stages at which developers can intervene

15 Ethical Principles

15 I. Transparency 15 Meaning 16 Checklist for Developers at Different Points of Intervention17 Good Practices 18 At a Glance 18 Instances of challenges developers may face 19 Further Questions

20 II. Accountability 20 Meaning 21 Checklist for Developers at Different Points of Intervention22 Good Practices 23 At a Glance 23 Instances of challenges developers may face 24 Further Questions

25 III. Mitigating Bias25 Meaning26 Some Reasons for Bias in AI 27 Checklist for Developers at Different Points of Intervention28 Good Practices 29 At a Glance 29 Instances of challenges developers may face 30 Further Questions

31 IV. Fairness 31 Meaning 32 Checklist for Developers at Different Points of Intervention33 Good Practices 34 At a Glance 35 Instances of challenges developers may face 35 Further Questions

36 V. Security36 Meaning 37 Checklist for Developers at Different Points of Intervention38 Good Practices 39 At a Glance 39 Instances of challenges developers may face 40 Further Questions

41 VI. Privacy41 Meaning 42 Checklist for Developers at Different Points of Intervention43 Good Practices 44 At a Glance 44 Instances of challenges developers may face 45 Further Questions 45 Ethics in AI literature and policy documents

46 Section II: Data Protection

47 Introduction

48 Data Protection under Information Technology Act, 2000

49 Sections under IT Act 49 I. Liability of corporate entities 49 II. Reasonable security practices and procedures and sensitive personal data or information Rules, 2011 49 i. Sensitive personal information 49 ii. Privacy policy50 iii. Collection of Information50 iv. Disclosure of Information50 v. Reasonable Security Practices and Procedures

51 Data Protection under Personal Data Protection Bill, 2019

51 Glossary 51 Personal Data 51 Data Principal 51 Data Fiduciary 51 Data Processor 51 Processing 51 Significant data fiduciary 51 Sensitive Personal Data

73 Storage Limitation 73 At a glance 73 What is the storage limitation principle?74 For how long can personal data be stored?74 When should you review the necessity of retention?75 When can personal data be stored for a longer period of time?

76 Data Quality76 At a glance76 What is the principle of data quality?77 When is data accurate/inaccurate?77 Whose responsibility is it to maintain data quality?

78 Accountability and Transparency 78 At a glance78 Introduction79 Privacy by Design 80 How would Privacy by Design be implemented by law? 81 What are accountability measures prescribed by law? 82 What are the transparency requirements prescribed by law?

84 Rights of Data Principals85 Right to confirmation and access86 Right to Correction and Erasure87 Right to Portability88 Right to be Forgotten

90 Compliance Map

91 Annexure A Technical Measures for Data Security

93 Annexure B List of Abbreviations

94 Annexure C Master Checklists

100 References

52 Principles of Personal Data Protection

54 Personal Data 54 At a glance54 Introduction 55 What is personal data?56 What is meant by direct or indirect identification?57 What should you do if you are unable to determine whether data is personal or not? 58 Are there other categories of personal data?

59 Data Fiduciaries and Data Processors59 At a glance59 Introduction60 Who is a Data Fiduciary?60 Who is a Data Processor?61 How to determine if you are a data fiduciary or a data processor?

62 Notice and Consent 62 At a glance 62 What is the principle of notice and consent? 63 How do you implement the principle? 63 What is the standard of valid consent? 64 Are there exceptions to the principle of notice and consent?

65 Purpose Limitation 65 At a glance 65 What is the meaning of purpose limitation?66 How do you specify your purpose? 66 Can the data be used for purposes other than the ones specified? 68 Data Minimisation68 At a glance 68 What is the principle of data minimisation? 69 How do you determine what is relevant and necessary?70 Is data minimisation principle antithetical to AI?71 Are there legal provisions protecting big-data applications? 71 What data minimisation techniques are available for AI systems?

10 How to read this Handbook?

10 Meaning - Brief explanation of principle and why it is important

11 Section I : Ethics in AI

12 Introduction

13 Stages at which developers can intervene

15 Ethical Principles

15 I. Transparency 15 Meaning 16 Checklist for Developers at Different Points of Intervention17 Good Practices 18 At a Glance 18 Instances of challenges developers may face 19 Further Questions

20 II. Accountability 20 Meaning 21 Checklist for Developers at Different Points of Intervention22 Good Practices 23 At a Glance 23 Instances of challenges developers may face 24 Further Questions

25 III. Mitigating Bias25 Meaning26 Some Reasons for Bias in AI 27 Checklist for Developers at Different Points of Intervention28 Good Practices 29 At a Glance 29 Instances of challenges developers may face 30 Further Questions

31 IV. Fairness 31 Meaning 32 Checklist for Developers at Different Points of Intervention33 Good Practices 34 At a Glance 35 Instances of challenges developers may face 35 Further Questions

Section II explains the data protection framework in India. �e section is divided into two broad chapters:

1. Data Protection under Information Technology Act, 2000

2. Data Protection under Personal Data Protection Bill, 2019

�e second chapter i.e., Data Protection under Personal Data Protection Bill, 2019 is divided into smaller chapters based on the principle of data protection it refers to. �ese are:

• Personal Data• Data Fiduciary and Processor• Notice and Consent• Purpose Limitation• Data Minimisation• Storage Limitation• Data Quality• Accountability and Transparency• Rights of Data Principal

Each chapter is structured in an FAQ format, with an ‘At a glance’ section summarising key points. Actionable items under each chapter are captured under ‘Call to Action’ boxes.

Each section also has a quick reference 'Checklist for Developers' that briefly summarises compliance requirements in a tickbox format".

A Compliance Map at the end of Section II captures the workflow and summarises actionable compliance requirements in form of a Master checklist.

The Handbook is divided into two sections - Ethics (Section I) and the Law (Section II). Section I covers Ethics in AI and Section II investigates the legal requirements for data protection.

How To Read This Handbook?

Based on existing AI literature and policy documents [Page 45], we discuss six ethical principles in Section I of this document.

1. Transparency

2. Accountability

3. Bias

4. Fairness

5. Security

6. Privacy

Under each section, the principles are explained in the following manner:

• Meaning• Checklist• Good Practices• At a Glance• Challenges

• Meaning: Brief explanation of the principle and why it is important.

• Checklist: A quick reference checklist for developers at each stage of intervention: Pre-processing; Processing and Post-processing [explained on Page 13]

• Good Practices: A list of good practices and frameworks sourced from existing literature

• At a glance: Summary of the principle and quick tips

• Challenges: Challenges that developers face. This is based on discussions with developers and examples borrowed from existing literature.

Section-I Section-II

10 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Section I

Ethics in A I

Technology alone is not enough. It’s technology married with liberal arts, married with the humanities, that yields the results that make our hearts sing.

Steve Jobs

In our way of working, we attach a great deal of importance to humility and honesty; With respect for human values, we promise to serve with integrity.

Azim Premji

11 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Development in AI raises signi�cant questions on law and society. �e law answerssome of these questions but cannot anticipate every technological breakthrough, given the speed of technological progress. Consequently, compliance with the law is mandatory but inadequate to meet all the concerns of technology and society. National governments, industry leaders and academia seek to �ll this gap by promoting principles or frameworks for ethical AI.

�e literal meaning of ethics is self-evident but there is want of a common understanding of ‘ethics’. According to a report by the Alan Turing Institute on Understanding Arti�cial Ethics and Safety, “AI ethics is a set of values, principles, and techniques that employ widely accepted standards of right and wrong to guide moral conduct in the development and use of AI technologies.” Establishing an ethical framework for AI starts with explaining the risks and opportunities regarding the design and use of technology. �e objective of ethics in AI is

not to limit its scope and underuse technology, but to create an ecosystem where opportunities are maximised, and risk minimised. For developers and start-ups that deploy AI, an ethical approach provides a dual advantage. It enables them to harness social value by using new opportunities that society prefers and appreciates and helps them navigatepotential risks.

An ethical framework for AI development is a governance structure that we can break down into design goals at each stage of development. In a nutshell, an ethical framework should consider:

• �e potential impact of the project on stakeholders and communities.

• Anticipating discriminatory e�ects of AI project on individuals and social groups to assess if the project is fair and non-discriminatory

• Identifying and minimising biases in the dataset that may in�uence the model’s outputs. Developers should

try to perform this check at every stage of design.

• Enhancing the level of public trust in the project through guarantees for transparency, accuracy, explainability, reliability, safety, and security on a best-e�orts basis.

• Methods to explain the AI project to stakeholders and communities including the decision-making process and redressal mechanisms in place, to the extent possible.

Introduction

The objective of ethics in AI is not to limit its scope and underuse technology, but to create an ecosystem where opportunities are maximised, and risk minimised.

12 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Stages at whichDevelopers can Intervene

Artificial Intelligence enabled systems contain different points of intervention to mitigate ethical concerns such as bias, incorrect/excessive training data, incorrect training algorithm, inadequate redressal. Based on our discussions with established start-ups, practitioners and developers, we have classified AI development into three stages for intervening in the decision-making process, from an ethical point of view. These arepre-processing, in-processing, and post-processing.

“Pre-processing techniques seek to approach the problem by removing underlying discrimina-tion from data prior to any modelling.” At the pre-processing stage, developers may take several steps to mitigate ethical concerns. For example, they may balance the datasets and enable data traceability. It is important to trace the lineage of data collected for training models and ensure that it is collected and stored following data protection regulations. At this stage, developers can also anticipate security and privacy risks with the model and prepare methods to mitigate these harms.

Pre-processing

�is is the second stage of the development process. At this point, developers can mitigate ethical concerns while the model is being designed. “In-processing techniques mitigate discrimina-tion during training through modi�cations to traditional learning algorithms. At this stage, developers can intervene to encourage or restrict reliance on certain attributes during training, trace and identify entry points for intervention, and secure the model from security risks. It is also important to embed measures to allow diagnostics and visibility of the training process.

In-processing

�is refers to a stage where the AI model has already learnt from the data. Complex AI models are constructed as a black box. Lack of visibility into AI functioning is detrimental to transparency and accountability and prevents any course correction. For post processing assessment, it is important to build in checks and balances. Post-processing techniques include interpreting the knowledge and checking for potential con�icts with previously induced knowledge. Post-process-ing audit and review mechanisms are critical to inculcate explain-ability and hence public trust in AI models.

Post-processing

13 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Figure 1: Stages of Intervention

Involves removal of underlying discrimination from the data prior to modeling

Focuses on making the dataset more balanced and providing traceability to data

Necessitates tracing the lineage of data collected for training models and ensuring

that they were collected and stored following data protection regulations

This is the second stage at which developerscan mitigate ehical concerns while the model

is being designed

Involves mitigating discrimination during training by modifying traditional leaning algorithms

Allows the developers to introduceinterventions and constraints during the

learning process of algorithms

At this stage the model has already learnt from the data and is treated like a black box

Includes interpreting the knowledge andchecking for potentional conflics with

previously induced knowledge

Entails auditing and reviewing the model to understand what preditictions are at high

risk and need to be rejected

PRE-PROCESSING

IN-PROCESSING

POST-PROCESSING

14 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Ethical Principles

Complex algorithmic systems are commonly called ‘black boxes’ because you can see the inputs and outputs, but not the process behind the results. Transparency is a design choice that can help see into this black box, increase accountability, and enhance public trust in AI decision-making. Transparency provides insight into a system, and is the �rst step towards making AI explainable, redressable, appealable and accountable.

�ere are multiple levels of transparency. For a user/bene�ciary, transparency means an easily understandable explanation of the decision-making process followed by the AI. For developers, it means source code visibility. Transparent design informs users that AI systems are making automated decisions and helps users understand the basis for these decisions. �is empowers users with the capacity to dispute or appeal it.

For a developer, transparency refers to the ability to read, interpret and understand underlying design supporting an AI-enabled system. From a practical perspective, transparency means a developer should be able to diagnose errors in decision making, �nd intervention points and correct automated decisions.

Scenario:

A loan applicant comes to know that her bank has denied her loan application to start a new business. �e bank tells her that their AI enabled system considered her application and rejected her loan. She has no knowledge of AI and responds that she has never defaulted on bills and this is the �rst time she is applying for a loan. �e bank employee tells her that he does not understand the decision-making either.

How will transparency help?

Transparent design allows loan applicants to contest automated decisions. �e AI enabled system may have based the decision on incorrect training data or by considering wrong factors. For example, the AI could have considered attributes like age, gender, or marital status and made the decision in the above case. Transparency should allow the bank and the applicant to revisit the attributes considered by the AI system. From the developer’s perspective, transparency is important to review the weightage given by the system to each attribute and course correct, as necessary.

I. Transparency

Meaning

15 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Pre-Processing

Are you aware of the source of data used for training?

Have you recorded the attributes to be used for training and the weightage for each?

Is there scope for diagnosing errors at a later stage of processing?

In-processing

Have you checked for any blind spots in the AI-enabled system and decision-making process?

Can the AI decision-making process be explained in simple terms that a layperson can understand?

Post-processing

Is it possible to obtain an explanation/ reasoning of AI decisions? For instance, can a banker obtain a record for acceptance/denial of a loan application.

Is there a mechanism for users and bene�ciaries to raise a ticket for AI decisions?

Is there scope for oversight and human review of AI decisions?

Checklist for Developers at Different Points of Intervention

16 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

• Transparency can be achieved through hygiene governance. Developers can build (i) operational and development protocols to ensure design review and explainability; (ii) clear and meaningful articulation of terms of use; (iii) transparency impact assessments and periodic review.

• �e Defense Advanced Research Projects Agency created a model for Explainable AI (XAI) that makes automated decisions explainable. It seeks to provide an explanation for AI outcomes. Local Interpretable Model Agnostic Explanations (LIME) is one such approach.

• LIME is a model-agnostic solution that provides explanations anyone can understand. LIME is an algorithm, which can explain the predictions of a classi�er. It can help determine if one can trust (a) a predication su�ciently to take action based on it

and (b) trust the model enough for it to behave in reasonable ways. Some desirable characteristics of explanation methods are – interpretability i.e. provide qualitative understanding between input variable and response.

• Transparency by Design is a practical guide to help promote the bene�ts of transparency while mitigating challenges posed by AI enabled systems. Researchers have developed a model that could aid start-ups design transparent systems, based on a set of nine principles that incorporate technical, contextual, informational, and stakeholder-sensitive considerations. It also supplies a step-by-step roadmap on integrating these principles in the pre-processing stages,

• Another solution to improve transparency is a decision tree. All data is displayed in the form of a tree. A developer can use this as a diagnostic tool by

starting at the top of each level and selecting a branch based on the value of a particular feature. �e outcome at the last leg or the foundation is the outcome. �is simple model can provide a high degree of transparency for the developer, in instances where the model is based on a manageable amount of data.

• Transparent models such as Rule Based Learners, General Additive Models, Bayesian Models, K-Nearest Neighbours can supplement the decision tree model based on the complexity of variables, levels involved and how the AI interacts with the variables.

• Post-hoc explainability is another model. In this model, the nature and properties of outputs yielded by the black box can be reverse engineered to explain outcomes. However, this need not be accurate all the time.

Good Practices

17 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Devices with non-visual interfaces1: Developers may need to think of ways in which transparency requirements can be presented to users when they develop products such as smart speakers (as opposed to services like dealing with loan applications), where it may not be feasible to explain these issues through visual means.

Instances of challenges developers may face

At a Glance

To ensure transparency, developers may consider developing the following protocols.

An easy-to-explain model of the decision-making process(like a decision tree).

List of attributes considered and weightage for each.

Map of intervention points.

Redressal mechanism for users and beneficiaries.

18 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

1 For example Alexa

Is transparency antithetical to Intellectual Property Rights (IPR)?

Transparency has to be balanced against intellectual property protection. Algorithmic systems involve signi�cant investment by private �rms and often they are protected as trade secrets or copyright. Hence, transparency does not refer to full disclosure of all aspects of the algorithm or the source code supporting it.

Why is transparency important?

Some algorithmic systems are so complex that they are often called ‘black boxes’ because you can see the inputs and outputs, but not their exact functioning/processing. Although it is challenging to understand how automated systems work, the proposed checklist for transparency can enhance trust in systems and improve overall e�ciency of AI. Transparency can ensure that AI systems do not use data in a way that leads to biased outcomes. It also ensures better compliance with the laws and enforces legal rights. �e ability to explain the processes involved can aid developers identify the vulnerabilities involved.

The US Center for Tropical Agriculture data-driven agronomy workintentionally designs models for ease of interpretation. �is helps them gain insights on the relationship between crop management practices and crop yield.If they had modelled their system using “black box” algorithms, they would be able to predict the yield but would have no visibility on the speci�c inputs that most strongly determined the yield.

Flowchart showing that the Explainer model takes the data and the prediction values as the input and creates a explanation based on shapely values. Source - PyCon Sweden : Interpreting ML Models Using SHAP by Ravi Singh

Figure 2: Explainer Model

Explainer Model

DATA

Prediction

Explanation

Further Questions

Black Box Model

19 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

II. Accountability

Meaning

AI is non-human, and it is impossible to hold AI accountable for its decisions within the current legal framework. However, developers can identify intervention points at the stage of algorithm design, AI system training and data collection. Accountability in AI decision-making means it is possible to trace AI outcomes/decisions to an individual or an organization or a step in the AI design process.

Intuitively, the designer or the

start-up could be held responsible, but the challenge is that decision-making by AI systems undergo signi�cant changes as they scale-up. Another reason why it is di�cult to attribute responsibility to AI actions is that there may be multiple actors undertaking the various activities. For example, while one actor may design the system, other may train the system using data collected by someone else, and another would make it available for users/bene�ciaries. �is is

known as the “many hands problem” which is associated with layered handling and decision making at di�erent stages of complex systems.

�e purpose of accountable design is to both provide users/bene�ciaries with a redressal mechanism and to solve problems internally as they come up. As there is no legal clarity on holding AI accountable, accountability as an ethical principle is the obligation to enable redressal and address design �aws.

Scenario:

A loan applicant comes to know that her bank has denied her loan application to start a new business. �e bank tells her that their AI enabled system considered her application and rejected her loan. She wants to appeal the decision, but the banker is unable to guide her to the appropriate authority.

How will accountability help?

Accountable design and process would help the user seek an explanation on why she was denied a loan and seek a review of the decision. Accountability would also help the banker maintain its customer relations by guiding customers to the appropriate redressal point. In an ideal scenario, the bank �les for a decision review and the AI provides the reasoning or factors considered for an outcome. �e bank may then analyse this and reverse or maintain the decision to deny a loan.

20 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Pre-Processing

Does the start-up have policies or protocols or contracts on liability limitations and indemnity? Are these accessible and clear to you?

Does your organisation have a data protection policy? Are data protection guidelines being followed in the collection and storage of data, if the data is procured from a third-party?

Are there adequate mechanisms in place to �ag concerns with data protection practices?

In-processing

Can the AI-enabled system be compartmentalised based on who develops the speci�c�eld?

Is it possible to apportion responsibility within your set-up if multiple developers have worked on a project?

Is it possible to maintain records on design processes and decision-making points?

Post-processing

Can decisions be scrutinised and traced back to attributes used and developers who worked on the project?

Have you identi�ed tasks that are too sensitive to be delegated to AI systems?

Are there protocols/procedures in place for monitoring, maintenance and review of AI enabled systems?

Checklist for Developers at Different Points of Intervention

21 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Algorithmic Impact Assessment by AI Now Institute

The AI Now Institute suggests a practical accountability framework that involves public input and participation in assessing the impact of an AI enabled system on people. �e Institute developed this framework based on existing impact assessment frameworks for human rights, environmental protection, data protection, and privacy. Although the mechanism is proposed for public agencies mainly, developers could also learn from this framework. Key elements of the assessment include:

• An internal assessment of existing systems and systems under development. An evaluation of potential impact on fairness, justice, bias, or other concerns of the user/bene�ciary community.

• Meaningful review processes by an external/third-party researcher to track the progress of the system and its impact.

• A public notice with an easy-to-understand explanation of the decision-making process followed by the AI, and information on the internal assessment and third-party review process mentioned above.

• Public consultation with affected users/beneficiaries/stakeholders to address concerns and clear doubts.

• A mechanism for the public to raise concerns on the above process or any other substantive concerns.

• Establish an auditing mechanism to identify unwanted consequences, such as unfair bias, a solidarity mechanism to deal with severe risks in AI intensive sectors.

• A redressal mechanism to remedy or compensate for wrongs and grievances caused by AI. A widely accessible and reliable redressal mechanism is necessary to build public trust in AI. A clear and comprehensive allocation of accountability to humans and/or organisations is

paramount in such a scenario.

• A good example is the ‘Exception Handling Measures’ provided by the Unique Identi�cation Authority of India. It lists Application Programming Interface (API) errors and guidelines for handling errors within the application at di�erent stages of authentication and service delivery.

• Accountability is the idea of tracing responsibility to human beings instead of

saying that technology is responsible. For this, it is important to identify decision making processes that are too sensitive to be delegated to AI systems. Some accountability concepts are:

1. “human in the loop” - ensuring that there is human intervention in cases where fully automated decisionsare made

2. “human over the loop” - human involvement in adjusting parameters.

Good Practices

NITI Aayog Accountability Framework

The NITI Aayog suggests that a framework for identifying failure points in AI could include the following. A legal framework for accountable AI could borrow from this framework. �is framework indicates the extent of liability that could be built into such systems in future.

• A negligence test for damages caused by AI enabled systems, instead of strict liability. �is can be achieved through self-regulation by conducting damage impact assessment at each stage of development.

• Safe harbours may be formulated to insulate or limit liability if proper steps are taken by the individual to design, test, monitor, and improve the system.

• A policy on accountability may include actual harm requirements so that lawsuits cannot be �led based on speculative damages.

22 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

�ere are certain AI technologies where humans may not have adequate control or entry points for intervention. Autonomous vehicles are one such example. Often the technology may be designed to respond much quicker than humans. In such a scenario, it is di�cult to intervene before an accident. One can only diagnose an error after.

Another concern is the problem of many hands or distributed responsibility. When multiple developers are involved in the development of a technology and in cases where the AI learns from external output, it is di�cult to pinpoint one problem area. In 2016, Microsoft launched a chatterbox, Tay that produced misogynistic sentences and racist slurs after a point. Designers of the chat box were held responsible, but there was no accountability on the part of the users who may have introduced the bot to racist and misogynistic words.

Instances of challenges developers may face

Enhancing accountability requires a strategy that draws a clear chain of responsibility depending on the harm or failure caused by the AI system. �is provides a path to identify where or who in the process is responsible for speci�c actions by AI systems and subsequently provide a redressal route. To ensure accountability, developers should consider developing the following protocols.

A diagnostic tool to verify that data was collected legally.

A map of the system and individual responsible for each function.

Redressal mechanism for beneficiaries of the system.

At a Glance

23 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

What is the di�erence between transparency and accountability?

Transparency and accountability are two closely related concepts. �e di�erence is that transparency is simply visibility on the functioning of an AI enabled system and accountability is the ability to traceeach function to an individual. Transparency is important to review systems internally while responsibility is assigned mostly for external action.

Why is accountability important?

If AI systems function without any accountability, there is no obligation to ensure that decision-making adheres to legal and/or ethical principles. �is could lead to adverse outcomes such as service denial. In projects such as Aadhar2, a biometric failure could lead to denial of access to government ration, cooking gas, bank accounts or other government support. �e problem is further exacerbated if these people do not have a point of contact to address concerns. �e need here is to compartmentalise AI systems so that errors can be traced back to design and e�ectively redressed.

Further Questions

24 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

2 Aadhar is a 12-digit unique identity number issued by the Unique Identification Authority of India (UIDAI) for identity verification and transfer of government benefits.

Scenario:

A loan applicant comes to know that her bank has denied a loan application to start a new business. A male friend of hers with the same credit rating and similar economic background is granted the loan. �e bank tells her that their AI enabled system considered her application and rejected her loan. On further investigation, it is revealed that in its 100 years of existence, the percentage of loans granted to unmarried women was less than 1 per cent.�is training data was fed into the AI enabled system.

How do you mitigate bias in this scenario?

�e �rst step here is to trace the source of bias. In most instances, bias in AI is an ampli�cation of biased training data or biased design. Once the source has been identi�ed, multiple practices exist to mitigate this bias. It is important to be cognizant of structural biases that exist in society to be able to easily identify the reason behind biased outcomes. At an organisational level, the start-up could adopt equal opportunity policies, recruitment to ensure diversity and representation, and awareness initiatives to educate developers/designers on social realities.

III. Mitigating Bias

Meaning

In the context of automated decision-making by AI-enabled systems, we can understandbias to mean outcomes that are structurally less favourable to individuals from a certain group without a rationale justifying this distinction. Without checks and balances, bias ampli�es within an AI-enabled system and leads to discriminatory outcomes on a group, even though a developer or designer may not have intended this. �ere is no legal framework to mitigate these harms.So, it is important for developers to voluntarily trace,

treat and mitigate these harms ethically to prevent unintended outcomes.

AI systems reproduce socially embedded or historical discrimination. �is could be a design issue or an issue with the training data used or even a subsequent learning of the AI-enabled system once it starts learning from the public. As AI systems scale up, the social impact of these biases also grows.

Bias in AI may show up due to several reasons such as unrepresentative training data,

unconscious assumptions of programmers or biases in the community norms. It could also be a result of the technical architecture, data on which the system is built, or the context in which the system is deployed. Bias can stem from using attributes such as gender, race, or caste as training data without checks, and in some cases, without tracking how the AI system is processing data. Bias signi�cantly diminishes the accuracy of an AI system.

25 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Some Reasons for Bias in AI

Reasons for bias Explanation

Insu�cient data collection Data collected may be insu�cient to represent the social realities of the space that the AI targets. Due to this, AI may not be able to attain its desired output.

Data may not be su�ciently diverse to capture all facets of the group an AI-enabled system seeks to work for. In such cases, the data might end up training the AI to discriminate against under-represented groups.

For instance, an AI to detect cancer and trained on data available in North European countries may overwhelmingly represent white skin types that have low melanin content as opposed to dark skin tones with higher melanin, leading to incorrect results in a country like India.

Even if protected attributes like gender or race are removed, data could have bias due to historical reasons.

For example, a hiring algorithm by Amazon favoured applicants based on words like “executed” or “captured” that were mostly used by men in their resumes. Learning from this, the algorithm started preferring men over women and even dismissed resumes with the word ‘woman/women’ in them. Amazon eventually stopped using the algorithm.

Poor predictions may also be the result of low-quality, outdated, incomplete or incorrect data at di�erent stages of data processing.

Insu�cient diversity in data

Biases in historical data

Use of poor-quality data

26 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Pre-Processing

Are you able to identify the source/sources of bias at the stage of data collection?

Did you check for diversity in data collection before it was used as training data to mitigate bias?

Did you analyse the data for historical biases?

In-processing

Have you assessed the possibility of AI correlating protected attributes and bias arising as a result?

Do you have an overall strategy (technical and operational) to trace and address bias?

Do you have technical tools to identify potential sources of bias and introduce de-biasing techniques? Please see Appendix for a list of technical tools that developers may consider

Have you identi�ed instances where human intervention would be preferable over automated decision making?

Post-processing

Have you identi�ed cases where human intervention will be preferred over automated decision making?

Do you have internal and/or third-party audits to improve data collection processes?

Checklist for Developers at Different Points of Intervention

27 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

• Developers may use methods like relabelling, reweighing, data transformation to eliminate correlation between protected attributes or appropriate technical methods to ensure that the data samples are fair and representative enough to give accurate results.

• Ensure that training samples are diverse to avoid racial, gender, ethnic, and age discrimination or redact protected attributes when collecting data.

• Developers can ensure multiple and diversi�ed human annotations per sample and that those

annotators come from diverse backgrounds.

• Developers could measure the outcomes for di�erent ethnic/racial and/or gender groups to see if there is unfair treatment.

• Developers could actively collect more training data from historically under-represented groups that may be at risk of bias and apply de-biasing techniques to penalize errors.

• Regular audits of the production models for accuracy and fairness, and regularly update the models using newly available data.

• Apply bias mitigation techniques such as adversarial debiasing, Fairness constraints, prejudice remover regularizer in process.

Good Practices

28 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Multilingual models for training data are signi�cant challenges that developers face and is also a source of bias percolation. In most cases, English is the language of preference and the one that is easily referenced. �is signi�cantly impairs the ability to trace bias in inputs from other languages.

In India, data collection is mostly done in urban areas and from segments of the society or speci�c regions. �ese do not include other dialects from rural parts. �e data overrepresents dominant communities. As a result, although the product is meant for every Hindi user, the product will not be a good �t as it lacks the due diligence of representing data adequately.

In case of facial recognition, insu�cient number of images or false correlations drawn by the AI enabled system could lead to bias in classifying images. �ere are numerous instances of facial recognition being faulty in settings involving minorities and instances where the software was applied to subjects that were not adequately represented in training data.

The COMPAS software and the Allegheny Family Screening Tool in the United States were found to be biased against African-American and biracial families. �e biases of the model stem from a public da taset that re�ects broader societal inequities. By relying on publicly available data, those who had no alternative to publicly funded social services, predominantly low-income families, were over-represented in the data.

Instances of challenges developers may face

Bias becomes an evident concern only once the AI system starts producing outputs. However, measures to mitigate bias can be adopted at multiple stages of processing. �e developer’s prerogative is to anticipate potentially biased outcomes before they arise. �e source of bias could be the data used during training/processing, the attributes used by the AI system or the functions for which the AI is designed for. �e developer should be aware of potential biases in each of these aspects. Some measures that developers can practice in the interest of mitigating bias include:

Regular audit of data and labels to ensure diversity in data and attributes used.

Technical methods to ensure fairness such as reweighting, relabelling and data transformation to eliminate any unanticipated correlation.

Awareness exercises on historical and contextual biases and discussion on technical protocols for bias mitigation.

Outcome measurement and comparison of results across identities to ensure equally accurate outcomes.

At a Glance

29 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

What kind of algorithms are more likely to amplify biases?

Any algorithm can develop a bias. So far, the most common example/instances of bias that we can identify are in (i) Online recruitment tools: �e case of Amazon’s hiring tool is mentioned above; (ii) Word associations: Princeton University research found that an algorithm was associating the words ‘woman’ and ‘girl’ with arts and humanities and men with science and math; (iii) Bias in online advertisements: Research by the Federal Trade Commission in the United States found that online search queries for African-American names were most likely to produce arrest records, while for white males, the search results were different; (iv) Facial recognition bias: One such case has already been discussed in the chapter. (v) Criminal justice algorithms: The example of COMPAS software describes such biases.

What is the impact of biased algorithms?

Bias directly results in unequal allocation of opportunities, resources, or information. In a world that is adopting AI in many important public functions like ration distribution, and determining eligibility for welfare schemes, and other crucial functions like granting loans, bias towards a community that is already marginalised can have disastrous consequences. It could also have a very direct impact on individuals and their well-being.

Further Questions

30 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Scenario:

A loan applicant comes to know that her bank has denied a loan application to start a new business. A male friend of hers with the same credit rating and similar economic background is granted the loan. �e bank tells her that their AI enabled system considered her application and rejected her loan. On further investigation, it is revealed that the AI enabled system denies loans to all unmarried women. �e start-up that supplied the AI technology is sure that this is the optimal outcome, and the bank should accept the decision suggested by the AI.

How do you address fairness in this scenario?

�e technology provider in this case is sure that the outcome produced by the AI is accurate. Despite the scope for bias in training data, the case may be that denying this loan ensures that there is no loan default. However, it is logically incorrect to say that an outcome is accurate when it is ignoring a reality that unmarried women have not been granted loans before by the bank. Here, there is a need to reconcile the accuracy rate of the AI enabled system with social realities. �e thumb rule is that an AI-enabled system should not produce disproportionately accurate or inaccurate results for one group, in comparison to the other. False positives and false negatives3 must not vary across social, ethnic, gender, caste, class groups.

IV. Fairness

Meaning

Fairness is one of the most relevant and the most di�cult topics in AI ethics. It is a normative concept and has numerous de�nitions. Arvind Narayanan at the Association for Computing Machinery Fairness, Accountability and Transparency (ACM FAccT) Conference in 2018 presented some of these de�nitions for fairness. �e work understands the social context of fairness in a technical manner and seeks to balance accuracy with fair outcomes. Broadly, these de�nitions say that developer must actively include checks and balances during algorithmic design to ensure that there is no individual or group discrimination in outcomes of the AI process.

For ensuring fairness, developers need to follow ethical values and assesswhether AI systems impact people adversely. For example, AI may unfairly allocate work opportunities where women candidates are rejected or AI facial recognition systems may not provide the same quality of service to users with a particular skin tone. �ese are biased outcomes and the role of fairness as an ethical imperative is to ensure that these outcomes are not overlooked for the sake of accuracy. Fairness addresses the ethical dilemma of choosing between what is a fair outcome and what is an optimal outcome.

Fairness helps developers understand what values cannot be compromised in algorithms. Academics refer to Fairness along with Accountability and Transparency as the FAT framework. Industry has also engaged with fairness frameworks. In 2018, Accenture published a fairness toolkit to assist companies work towards fairer outcomes when deploying AI systems. An established framework/toolkit provides developers an advantage as it is a ready-reference for gauging fairness. �ere are also case-studies by companies such as Microsoft and Spotify that demonstrate implementational challenges with fairness toolkits.

31 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

3 False positive is an error when something does not exist, but the result shows it does and a false negative is when something exists, but the result does not show it. For example, in Covid testing a false positive is when someone shows positive when they do not have the virus and a false negative is when the test shows up as negative, but the person is actually infected. It is an error quotient.

Pre-Processing

Have you anticipated the possibility of the algorithm treating similar candidates di�erently on the basis of attributes such as race, caste, gender or disability status?

Are there su�cient checks to ensure that the machine does not base its outputs on protected attributes?

In-processing

Are there protocols or sensitisation initiatives to reconcile historical biases and build in weights to equalise potential biases?

Have you conducted due diligence to trace potential fairness harms that may arise directly or indirectly?

Do you have technical tools to diagnose fairness harms and address them?

Post-processing

Does the algorithm provide results that are equally and similarly accurate across demographic groups?

Have you checked outcomes to understand if the AI is producing equal true and false outcomes? Is there any scope for disproportionate outcomes?

Checklist for Developers at Different Points of Intervention

32 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

• Identify communities and groups who are vulnerable to fairness-related harms. Look at people who: (i) will use the AI system; and (ii) are directly or indirectly a�ected by the system (by choice or not). Next, the e�ect of AI decision-making should be tested against anti-discrimination laws and safeguards.

• A context specific approach to fairness where possibility of discrimination is identi�ed based on context and historical biases. At the �rst stage, this involves looking at the purpose and context for which AI is deployed. �e developer can then predict if the AI system will produce unfair

outcomes when assessing similar candidates because of their gender, race, caste or disability.

• It is also important to acknowledge that individuals may belong to overlapping groups i.e., di�erent combinations of gender, race, caste or disability. �ey may be especially vulnerable to fairness related harms and di�erent kinds of harms. Groups should be considered both separately and collectively to expand the understanding of harms.

• Refer to fairness existing toolkits such as this.

Good Practices

33 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

At a Glance

To incorporate fairness, developers must familiarise themselves with legal concepts mentioned below.

Transparency - it helps analyse how data processing will impact its users. Refer to the chapter on Accountability and Transparency.

Bias mitigation - it reduces the extent of harm people may experience by interacting with AI services and products. Developers must refer to chapters on data quality, rights of data principals and safeguarding sensitive personal data.

Protection of personal data and familiarity with rights of data principals - guides developers to secure data and avoid processes that are detrimental, discriminatory or mislead-ing. For details, refer to Rights of Data Principals and Personal Data.

Assessing whether you are processing information fairly depends partly on how you obtain it. If anyone is deceived or misled when the personal data is obtained, then this is unlikely to be fair.

To assess whether or not you are processing personal data fairly, you must consider more generally how it a�ects the interests of the people concerned – as a group and individually. If you have obtained and used the information fairly in relation to most of the people it relates to but unfairly in relation to one individual, there will still be a breach of this principle.

Fairness is a complicated subject with multiple definitions. Fairness cannot be separated from the context in which AI is applied so it is di�cult to have overarching principles to determine fairness across a wide range of AI applications. �e starting point of fairness is to ensure non-discriminatory outcomes. In short, the AI system should provide equally and similarly accurate results across demographic groups. To ensure fairness, a combination of the following may be considered

A map of anti-discrimination safeguards in law and functions of thesystem that should be mindful of these measures.

A policy on fairness in AI outcomes and a mandate to have the same range of accuracy across demographic groups.

Fairness checks at the envisioning stage, pre-mortem stage and product-greenlighting stage, or at pre-processing, in-processing and post-processing stages.

34 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Developers may use post-processing algorithms to reduce unfairness. Some challenges here are that post-processing algorithms may need access to sensitive data for calculating di�erent thresholds for di�erent groups, in case of loan grants for example, to predict their loan repayment ability. However, the use of sensitive data may be prohibited under law and developers may not be able to use the same.

Without a framework or a toolkit for reference, it is di�cult to anticipate future discrimination when the AI system evaluates two similar candidates. �is could lead to discriminatory outcomes that undermine fairness.

If a voice recognition system is mainly trained on male voices, the system may not perform accurately when used by women. �e quality of the data used leads to unequal performance of the services for di�erent groups in the population.

Instances of challenges developers may face

What does a fair approach mean?

A fair approach would recognise or account for structural and systemic inequalities, constitutional guarantees of a�rmative action and the responsibility to correct past unfair practices. As stated earlier, the fundamental idea of fairness is to ensure that fairness is not overlooked for objective accuracy. Fairness is an evolving concept, and it is paramount to build in checks and balances for fairness, given the increasing role of AI in society.

What di�erentiates fairness and bias as two distinct ethical values?

A bias is like a bug in a code. It is an unwarranted association made by the AI-enabled system, which may be deliberate in the design or an unintended association. Fairness is ensuring that the outcomes do not di�erentiate between two groups or even two individuals. Bias may be one of the reasons for unfair outcomes, but all unfair outcomes are not the result of a bias. A bias is always unfair. Bias is a design issue while fairness is a design plus outcomes issue, and both need separate attention and separate approaches.

Further Questions

35 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Scenario:

A loan applicant applies for a loan to start her business and the bank approves the loan. In her loan application, she said that the purpose of the loan was to start a fertilizer business. From the next day, she starts receiving calls and email advertisements for purchasing wholesale fertilizers, rubber gloves etc. When she approaches the bank, the bank tells her that they do not share personal information with third parties. �e AI technology provider also says that they do not harvest and sell data. On further diagnosis, it is revealed that someone has breached the AI enabled system and stolen data including sensitive data.

How does security help?

Security means inculcating practices and protocols that ensure that the network remains robust and free from outside interference. When AI is deployed in settings that involve signi�cant public interest, ensuring secure networks also have geo-political angles to it. Security protects the user/bene�ciary and also protects the organisation from �nancial risk. �ere have also been security attacks where the attackers seek money in exchange for returning control of an AI system. Security, as an ethical imperative, mitigates the damage caused by such instances to some extent.

V. Security

Meaning

Safety and security while using AI systems are important to preserve public trust in AI. As technology advances, so do the threats to security. Response to such threats should also grow accordingly. �e need for secure networks is not overstated, and more than 50per cent organizations in India su�ered from a cybersecurity breach in 2020. Many start-ups including bigger organisations like BigBasket and Flipkart, highlight that start-ups should be especially mindful of security as this could be a factor in their valuation, trust factor and general robustness.

Security immunizes the user or bene�ciary of an AI system from digital risks. �is means that they should be protected from unreasonable security risks, digital and physical, during the foreseeable use of the technology. A developer should keep in mind the following fundamentals in AI security: (i) be

aware of attempts to manipulate outcomes with inputs that appear legitimate; (ii) design the AI-enabled system and the development process to anticipate and mitigate security risks; (iii) incorporate general cybersecurity principles in AI design; (iv) limit those who can control or make changes to AI-enabled systems and create a register of log-ins.

Developers can build secure models if they are able to pre-empt security concerns and address them e�ectively. �e best practice here is to conduct a risk management audit to anticipate foreseeable misuse, i.e., risks associated with the use of an AI system in other contexts. �e Organisation for Economic Co-operation and Development (OECD) suggests standards for digital security risk management and risk-based due diligence under the MNE Guidelines and OECD Due

Diligence Guidelines for Responsible Business Conduct. For start-ups, ensuring security means adoption of proper security standards to ensure high safety and reliability while also limiting the risk exposure of developers. �is is a di�cult bargain for an AI technology start-up with limited sta� and limited resources.

�e role of AI in security has both advantages and disadvantages. On one hand, AI is a very useful tool in real time discovery of phishing and spam in password protection and user authentication, and tracing fake news and deepfakes. On the other hand, AI also enables e�cient, targeted attacks on a scale previously unimaginable. �e role of security in AI ethics is to ensure robust networks that prevent unauthorised access that could compromise the system and move AI’s decision-making away from the control of developers.

36 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Pre-Processing

Have you identi�ed scenarios where safety and reliability could be compromised, both for the users and beyond?

Have you classi�ed anticipated threats according to the level of risk and prepared contingency plans to mitigate this risk?

Have you de�ned what a safe and reliable network means (through standards and metrics), and is this de�nition consistent throughout the full lifecycle of the AI-enabled system?

In-processing

Have you created human oversight and control measures to preserve the safety and reliability risks of the AI System, considering the degree of self-learning and autonomous features of the AI System?

Do you have procedures in place to ensure the explainability of the decision-making process during operation?

Have you developed a process to continuously measure and assess safety and reliability risks in accordance with the risk metrics and risk levels de�ned in advance for each speci�c use case?

Post-processing

Have you assessed the AI-enabled system to determine whether it is also safe for, and can be reliably used by users with special needs or disabilities or those at risk of exclusion?

Have you facilitated testability and auditability?

Have you accommodated testing procedures to also cover scenarios that are unlikely to occur but are nonetheless possible?

Is there a mechanism in place for designers, developers, users, stakeholders and third parties to �ag/report vulnerabilities and other issues related to the safety and reliability of the AI enabled System? Is this system subject to third-party review?

Do you have a protocol to document results of risk assessment, risk management and risk control procedures?

Checklist for Developers at Different Points of Intervention

37 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

• Ensure that vendors follow security protocols after contracts for services are signed. Regularly check if technology partners are keeping their promises to protect personal information.

• Follow up with technology contractors to ensure they follow security protocols. If a vendor is supposed to delete extraneous information, ask for proof.

• Participate in international standard setting processes such as International Organisation for Standardisation (ISO) and the International

Electrotechnical Commission (IEC) standards for the development and deployment of safe and reliable AI systems.

• ISO/IEC 27001 series provides guidance on protection of privacy, irrespective of the type and size of organisations. �e latest standard ISO 27003adds an extra layer of security to the ISO 27001 standard and helps organisations design, build, implement and continually improve their personal information management systems.

Good Practices

38 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

In April 2018, OpenAI released the OpenAI Charter. Among other things, it states that their objective is to pub-lish research that benefits society. According to this Charter, safety and security is a factor that should determine whether you should publish AI research or not. The decision to release GPT-2 in a staggered manner stemmedfrom this Charter and the concern that the model could be used maliciously, primarily for the generation of scalable, customizable, synthetic media (deepfakes and augmented videos). For example, OpenAI noted that its model could be used to generate misleading news articles, impersonate others online, automate the production of abusive content online, and automate phishing content.

In November 2019, the cybersecurity company FireEye published a blog post revealing they were using GPT-2 to detect social media posts as part of information operations. The company’s researchers fine-tuned the GPT-2 model and taught it to create tweets that resembled the source. While researchers were able to detect malicious activity such as bot tweets, they also realised that the model could also lower the entry barrier to entry for bad actors if rolled out without more security protocols.

AI systems should be robust, secure and safe throughout their entire lifecycle. �is requires developers to look at the conditions of use presently, foreseeable use or deployment in unintended operations to minimise safety risk. Mainly, they should observe the following:

Ensure traceability, including in relation to datasets, processes and decisions made during the AI system lifecycle.

Analyse the AI-enabled system’s outcomes and responses to ensure contextual and state of the art AI.

Based on their roles, the context, and their ability to act, developers should apply a systematic risk management approach to each phase of the AI system lifecycle on a continuous basis.

Be mindful of risks related to AI-enabled systems, including privacy, digital security, safety and bias.

At a Glance

Instances of challenges developers may face

39 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Is following the ISO standard enough to protect the AI-enabled system?

An ISO standard is a minimum technical standard. It is also important to have processes and protocols to systematically review the robustness of the network. �e best standard of protection is to have an external auditor or cybersecurity provider secure the system. It is also important to ensure that security protocols, procedures and auditing is followed from the design stage to the implementation station to ensure maximum security and safety. See this articlefor a detailed explanation on why it is bene�cial for start-ups to go beyond the basic compliance standard.

What is the di�erence between privacy and security?

Privacy is the obligation to not misuse data, and security is the obligation to protect the entire AI-enabled system including data. A security breach could result in data being stolen which then becomes a violation of privacy. Developers need to understand that these are two distinct obligations and address them separately.

Further Questions

40 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Scenario:

A loan applicant applies for a loan to start her business and the bank approves the loan. In her loan application, she said that the purpose of the loan was to start a fertilizer business. From the next day, she starts receiving calls and email advertisements for purchasing wholesale fertilizers, rubber gloves etc. When she approaches the bank, the bank tells her that they do not share personal information with third parties. On further investigation, it is revealed that the AI enabled system that determines whether or not loans should be granted is selling personal information to third parties.

How does privacy help?

As de�ned above, privacy is the right to be left alone. In this scenario, this means that a loan applicant should be left alone and not disturbed with advertisements that she has not subscribed to. Privacy law is evolving at a rapid pace and some of the principles that have emerged are (i) informed consent should be taken from data principals before sharing data; (ii) the data should only be used, processed for the purposes that the subject has given consent for; and (iii) data should be deleted after the period of time for which it will be used. Mechanisms to ensure this is explained in this section and in detail in the legal section (Part II of this Handbook).

VI. Privacy

Meaning

Privacy can be understood simply as the right to be left alone. It is individual autonomy over sharing information about oneself, with others and the public at large. In India, the right to privacy is a fundamental right under Article 21 of the Indian Constitution. The Supreme Court recognised the right to privacy as a part of the fundamental right to life and liberty in 2017, in Justice K.S. Puttusawmy and Ors. v. Union of India. Jurists conceptualise privacy in

numerous ways. Among others, it means (i) bodily inviolability and integrity and intimacy of personal identity including marital privacy; (ii) the right to be left alone4; and (iii) a zero relationship between two or more persons in the sense that there is no interaction or communication between them, if they so choose5.

Privacy interacts with AI when large amounts of customer and vendor data are fed into algorithms to generate insights

without the knowledge of data principals. Privacy is also violated at the stage of data collection, if appropriate consent is not obtained before collecting and aggregating data for model training. However, it is important to note that AI models can learn only with data. �is is the basis for the ethical and social dilemma for developers - balancing privacy against technological breakthroughs and improved services.

41 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

4 Cooley, Thomas M.A Treatise on the Law of Torts,1888,p.29 (2nded.)5 Shils Edward,Privacy.Its Constitution and Vicissitudes, 31 Law & Contempt Problems, 1966, p.281

Pre-Processing

Have you considered deploying privacy speci�c technological measures in the interest of personal data protection? Please see the Appendix for an illustrative list of technological measures you could consider.

Are you able to trace the source for the data being used in the AI system? Is there adequate documentation of informed consent to collect and use personal data for the purpose of developing AI?

Is sensitive data collected? If so, have you adopted higher standards for protection of this kind of data?

Does the training data include data of children or other vulnerable groups? Do you maintain a higher standard of protection in these cases?

Is the amount of personal data in the training data limited to what is relevant and necessary for the purpose?

In-processing

Have you considered all options for the use of personal information(e.g. anonymisation or synthetic data) and chosen the least invasive method?

Have you considered mechanisms/ techniques to prevent re-identi�cation from anonymised data?

Have you been informed of the legal requirements for processing personal information? What measures does your organisation adopt to ensure compliance?

Post-processing

Are there procedures for reviewing data retention and deleting data used by the AI System after it has served its purpose? Are there oversight/review mechanisms in place? Is there scope for external/third-party review?

Beyond the data principal privacy, have you evaluated the potential of data of an identi�ed group being at risk?

Is there a mechanism in place to manage consent provided by data principals? Is this mechanism accessible to principals? Is there a method to periodically review consent?

Checklist for Developers at Different Points of Intervention

42 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

• Before processing personal information, anticipate the impact of the process on data protection and the likelihood of a risk to the rights and freedoms of natural persons. In case of new technology such as Arti�cial Intelligence, this is paramount and the nature of the processing, its scope and purpose, and the context in which the technology will be implemented needs close evaluation, even third-party or public scrutiny.

• Take best efforts to adopt a multidisciplinary approach. AI is more than just technology. It is important to put together multi-disciplinary approaches that can consider various perspectives on the consequences of technology on society.

• Limit the amount of personal data in the training data to what is

relevant and necessary for the purpose.

• Ensure and document that the AI system you are developing meets the requirements for privacy by design.

• Document how the data protection requirements are met. Documentation is one of the requirements of the regulations and will be requested by customers or users.

• Ensure good systems for protecting the rights of data principals, such as the right to information, to access and deletion. If consent is the legal basis of processing, the system must also include functionality enabling consent to be given, and to be withdrawn.

• Ensure that vendors follow security protocols after contracts for services are signed. Regularly check if

technology partners are keeping their promises to protect personal information.

• Follow up with technology contractors to ensure they follow security protocols. If a vendor is supposed to delete extraneous information, ask for a proof of deletions.

• Protocols for user-system interaction, and novel methods to segregate data through speci�c minimisation requirements are some techniques that can be adopted without compromising quality. Interdisciplinary associations with legal and other social sciences are also some useful techniques.

Good Practices

43 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

The Norwegian Tax Administration (NTA) developed a predictive tool to assist in selecting individuals for random checks for tax evasion and �ling errors. �ey tested 500 di�erent variables including the taxpayers’ demography, life history, and other details in their tax returns. Finally, only 30 variables were built into the �nal model. �is provides a good example of how it is not always necessary to use all the available data in order to achieve the desired purpose. However, this requires time and e�ort to streamline the product.

At a Glance

Instances of challenges developers may face

Privacy or data protection is critical to any AI business from a legal and an ethical perspective. �e legal mechanism for regulating data protection is in the �nal stages of deliberation in many countries including India. Some data such as �nancial data, health data, genetic data or children’s data is accorded higher protection than other personal data. Developers should be mindful of this. In the interest of privacy, developers should consider the following

Methods for reducing the need for training data and minimising data use.

Methods that uphold data protection without reducing the basic dataset.

Measures to track consent for data that is used and redressal or review mechanisms to provide data autonomy to data principals.

Adhere to data protection regulations and best implementation practices.

44 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Why is it important to understand di�erent classi�cation of data?

Data that is fed into an AI system can reveal much more information than what is required for the purpose for which the AI system was made. Sensitive data that reveals a person’s gender or race may increase the possibility of bias and distort the product's outputs. �is may pose signi�cant risks to the AI system as it will not be considered reliable for use. �erefore, the developers must learn to identify what data they are using so that they can develop appropriate methods to prevent its misuse. �e law classi�es data such as health records, biometric data, caste, religious or political beliefs, bank details, etc. as sensitive personal data.

Why are there sector-speci�c data protection guidelines?

Sector-speci�c data protection guidelines emerge as a response to large scale use of certain data in an unregulated environment which could potentially harm data principals. �e two biggest examples of such guidelines are for �ntech companies and in the health sector. Financial data and health data are accorded a higher degree of protection because lax protection could have severe rami�cations on individuals and because of the scale of data use in emerging technology sectors such as AI/ML.

Further Questions

Ethics in AI Literature and Policy Documents

45 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

• NITI Aayog National Strategy for AI • OECD Principles on AI - Human-centered values and fairness• High-Level Expert Group on Artificial Intelligence: Ethical Guidelines for Trustworthy AI• Singapore Model AI Governance Framework (Second Edition)• United Kingdom’s Ethics, Transparency and Accountability Framework for Automated Decision-Making• United Kingdom’s Data Ethics Framework• Explainable AI: The Basics Policy briefing, The Royal Society • The 8 Principles of ResponsibleAI - ITechLaw• Asilomar Principles on Bene�cial AI• Montreal Declaration for responsible development of AI• The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems• Tenets - Partnership on AI• Five Overarching Principles for an AI Code - UK House of Lords• Statement on arti�cial intelligence, robotics and 'autonomous' systems - European Parliament• AI Fairness Toolkit• �e Toronto Declaration: Protecting the right to equality and non-discrimination in machine learning systems• FAT/ML: Principles for Accountable Algorithms and a Social Impact Statement for Algorithms• Algorithmic Accountability Bill of 2019 in USA • Software and Information Industry Association’s Ethical Principles for Arti�cial Intelligence and Data Analytics• Algorithmic Impact Assessment Framework - AINow Institute

Section II

Data Protection

46 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Arti�cial Intelligence is data intensive and due to this reason, data protection laws assume great signi�cance for AI professionals. �e recognition of the right to privacy as a fundamental right by the Supreme Court of India has resulted in greater emphasis on ensuring that data collection, storage, sharing and processing practices respect user privacy. Presently, India is at the cusp of adopting a new data protection framework. �is includes the Personal Data Protection Bill, 2019 (PDP Bill), which is pending before the Parliament, and a proposed non-personal data framework, which is at an

early stage of its lifecycle.�e proposed laws will replace the existing framework consisting of the Information Technology Act, 2000 (IT Act) and the Information Technology (Reasonable security practices and procedures and sensitive personal data or information) Rules, 2011. �is will lead to new compliance requirements, more stringent implementation provisions and greater penalties.

�ere is a direct correlation between data protection framework and the quality of data collected. Under a weak data protection framework, an individual concerned about his privacy will have incentive

to furnish incorrect information about himself. �is will lead to poor data quality, which will then feed into poor data analytics and ultimately impact businesses too. Similarly, a stringent data protection framework can incentivise individuals to furnish more accurate personal information.

�is chapter will highlight the broad principles enshrined in the proposed framework, with the objective of getting AI professionals ahead of the curve. However, before describing the proposed framework, it is important to have a look at our existing data protection framework.

Introduction

Figure 3 : Existing data protection framework

Data Protection

Data Protection underInformation Technology Act

Data Protection underPersonal Data Protection Bill

47 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Until the PDP Bill is passed by the Parliament, India’s data protection framework will continue to be governed by the Information Technology Act, 2000 and the Information Technology (Reasonable security practices and procedures and sensitive personal data or information) Rules, 2011.

Data Protection underInformation Technology Act, 2000

Data Protectionunder IT ACT 2000

Sections under the IT Act

Liabilty of corporate entites(Section 43A)

Punishment for disclosure ofinformation (Section 72A)

Defines Sensitive Personal Information(Rule 3)

Mandates inclusion of a Privacy Policy(Rule 4)

Prescibes manner of Collection ofinformation (Rule 5)

Provides for Disclosure of Information(Rule 6)

Implementation of Reasonable SecurityPractices and Procedures (Rule 9)

Reasonable security practices and procedures and sensitive personal dataor information Rules, 2011

Figure 4 : Provisions relating to personal data protection under the IT Act

48 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

�ere are two broad provisions under the IT Act that relate to data protection.

I. Liability of corporate entities

• Onus is on corporations for protecting sensitive personal data: �is is given under Section 43A of the IT Act. �e corporate entities will be held liable for negligence in maintaining and implementing reasonable security practices while processing sensitive personal data, if it leads to wrongful loss (loss of property to which the person losing is legally entitled) or wrongful gain (gain of property to which the person gaining is not entitled) to any person. In such cases, the corporate will have to pay compensatory damages to the aggrieved person.

• Punishment for disclosure of personal information if there is a breach of contract. �is is given under Section 72A of the IT Act. �e clause punishes a person with imprisonment in case he discloses personal information in breach of contract or without consent

of the person to whom it belongs. It states that any person who is providing a service under a contract, discloses personal information obtained during the course of providing such services, without the person’s consent or in breach of terms of the contract, will be punished with either imprisonment for a term not exceeding three yearsor with a �ne not exceeding up to �ve lakh rupees, or both.

II. Reasonable security practices and procedures and sensitive personal data or Information Rules, 2011

Every entity that owns or processes data must implement reasonable security practices.�ese practices are given under rules known as the Information Technology (Reasonable security practices and procedures and sensitive personal data or information) Rules, 2011 [SPDI Rules]. �e Rules provide details about how sensitive data must be handled.

i. Sensitive Personal Information

�e de�nition of sensitive personal information is given

under Rule 3. It says that sensitive personal data or information of a person means personal information that consists of informationrelating to:1. password;

2. �nancial information such as bank account or credit card or debit card or other payment instrument details;

3. physical, physiological, and mental health condition;

4. sexual orientation;

5. medical records and history;

6. biometric information;

7. any detail relating to the above clauses as provided to body corporate for providing service; and

8. any of the information received under any of the above clauses by body corporate for processing, stored or processed under lawful contract or otherwise.

ii. Privacy Policy

�is is given under Rule 4. According to this Rule, all entities that are seeking sensitive personal data have a duty to a privacy policy. �is policy must be made easily accessible for people who are providing their information. �e policy must be published on the website of the entity collecting such data and give details on the

• Type of information that is being collected

• Purpose of collection

• Security practices in place to maintain the con�dentiality of such information.

iii. Collection of Information

Rule 5 provides the guidelines that need to be followed by a body corporate6, while they collect information. �e Rule imposes the following duties on the Body Corporate:

• Obtain consent from the person(s) providing information.

• Information shall only be collected for a lawful purpose and only if considered necessary for the purpose.

• It will be used only for the purpose for which it is collected and shall not be retained for a period longer than which it is required;

• Ensure that the person providing information is aware of the fact that the information is being collected. He/she must also be aware of its purposes & recipients, name and

addresses of the agencies retaining and collecting the information;

• Offer the person an opportunity to review the information provided and make corrections, if required;

• Before collection of the information, provide an option to the person providing information to not provide the information sought;

• The provider of information shall also have an option to withdraw their consent given earlier;

• Maintain the security of the information provided; and

• Designate a Grievance Officer whose name and contact details should be on the website and who shall be responsible to expedite and address grievances of information providers expeditiously.

iv. Disclosure of Information

Under Rule 6, disclosure of sensitive personal information to any third party will require prior permission from the

person providing the information. However, no prior permission is required if a request for such information is made by government agencies authorised under law or any other third party by an order under law.

v. Reasonable Security Practices and Procedures

Rule 8 provides the reasonable security processes and procedures that may be implemented by the collecting entity. International Standards (IS / ISO / IEC 27001) is one such standard which can be implemented by a body corporate to maintain data security. An audit of reasonable security practices and procedures shall be carried out by an auditor at least once a year or as and when the body corporate or a person on its behalf undertakes signi�cant upgradation of its process and computer resource.

Sections under IT Act

49 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

�ere are two broad provisions under the IT Act that relate to data protection.

I. Liability of corporate entities

• Onus is on corporations for protecting sensitive personal data: �is is given under Section 43A of the IT Act. �e corporate entities will be held liable for negligence in maintaining and implementing reasonable security practices while processing sensitive personal data, if it leads to wrongful loss (loss of property to which the person losing is legally entitled) or wrongful gain (gain of property to which the person gaining is not entitled) to any person. In such cases, the corporate will have to pay compensatory damages to the aggrieved person.

• Punishment for disclosure of personal information if there is a breach of contract. �is is given under Section 72A of the IT Act. �e clause punishes a person with imprisonment in case he discloses personal information in breach of contract or without consent

of the person to whom it belongs. It states that any person who is providing a service under a contract, discloses personal information obtained during the course of providing such services, without the person’s consent or in breach of terms of the contract, will be punished with either imprisonment for a term not exceeding three yearsor with a �ne not exceeding up to �ve lakh rupees, or both.

II. Reasonable security practices and procedures and sensitive personal data or Information Rules, 2011

Every entity that owns or processes data must implement reasonable security practices.�ese practices are given under rules known as the Information Technology (Reasonable security practices and procedures and sensitive personal data or information) Rules, 2011 [SPDI Rules]. �e Rules provide details about how sensitive data must be handled.

i. Sensitive Personal Information

�e de�nition of sensitive personal information is given

under Rule 3. It says that sensitive personal data or information of a person means personal information that consists of informationrelating to:1. password;

2. �nancial information such as bank account or credit card or debit card or other payment instrument details;

3. physical, physiological, and mental health condition;

4. sexual orientation;

5. medical records and history;

6. biometric information;

7. any detail relating to the above clauses as provided to body corporate for providing service; and

8. any of the information received under any of the above clauses by body corporate for processing, stored or processed under lawful contract or otherwise.

ii. Privacy Policy

�is is given under Rule 4. According to this Rule, all entities that are seeking sensitive personal data have a duty to a privacy policy. �is policy must be made easily accessible for people who are providing their information. �e policy must be published on the website of the entity collecting such data and give details on the

• Type of information that is being collected

• Purpose of collection

• Security practices in place to maintain the con�dentiality of such information.

iii. Collection of Information

Rule 5 provides the guidelines that need to be followed by a body corporate6, while they collect information. �e Rule imposes the following duties on the Body Corporate:

• Obtain consent from the person(s) providing information.

• Information shall only be collected for a lawful purpose and only if considered necessary for the purpose.

• It will be used only for the purpose for which it is collected and shall not be retained for a period longer than which it is required;

• Ensure that the person providing information is aware of the fact that the information is being collected. He/she must also be aware of its purposes & recipients, name and

addresses of the agencies retaining and collecting the information;

• Offer the person an opportunity to review the information provided and make corrections, if required;

• Before collection of the information, provide an option to the person providing information to not provide the information sought;

• The provider of information shall also have an option to withdraw their consent given earlier;

• Maintain the security of the information provided; and

• Designate a Grievance Officer whose name and contact details should be on the website and who shall be responsible to expedite and address grievances of information providers expeditiously.

iv. Disclosure of Information

Under Rule 6, disclosure of sensitive personal information to any third party will require prior permission from the

person providing the information. However, no prior permission is required if a request for such information is made by government agencies authorised under law or any other third party by an order under law.

v. Reasonable Security Practices and Procedures

Rule 8 provides the reasonable security processes and procedures that may be implemented by the collecting entity. International Standards (IS / ISO / IEC 27001) is one such standard which can be implemented by a body corporate to maintain data security. An audit of reasonable security practices and procedures shall be carried out by an auditor at least once a year or as and when the body corporate or a person on its behalf undertakes signi�cant upgradation of its process and computer resource.

50 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

6 Body Corporate includes entities like a company, co-operative society, registered societies and firms etc.

Personal Data

Any information using which a person can be identi�ed either directly or indirectly is personal data. Examples of personal data include, name, age, gender, medical records, bank account details, etc., of a person.

Data Principal

Person to whom the personal data relates. For example, if you collect the age, gender and blood group of a person named ‘X’, X will be the data principal.

Data Fiduciary

A person or entity who decides the purpose and means of processing personal data.

Data Processor

A person or organisation that processes personal data on behalf of a data �duciary is a data processor.

Processing

Processing refers to operations or sets of operations performed on personal data. It includes, among other activities, the collection, storage, organisation, alteration, retrieval, usage, distribution, and erasure of personal data.

Signi�cant Data Fiduciary

A data �duciary may be classi�ed as a signi�cant data �duciary based on a number of factors including the volume of data processed, sensitivity of data processed, risk of harm, etc.

Sensitive Personal Data

Personal data which may reveal, be related to, or constitute �nancial data, health data, biometric data, genetic data, sexual orientation, sex life, caste or tribe etc.

Data Protection underPersonal Data Protection Bill, 2019

51 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Glossary

Purpose Limitation:personal data collected for

one purpose should notbe used for a new,

incompatible purpose

Data Minimization:the data collected must be

adequate, relevant andlimited to what is

necessary

Notice and Consent:mandates data fiduciaries

to notify dataprincipals and seek

their consentbefore collecting

their personal data

Data Quality:the data collected must be

accurate, completeand up to date

Storage Limitation:should not retainpersonal data for

longer than needed

Lawful Purpose:personal data can only be

processed for lawfulpurposes

Accountability:taking responsibility for

what one does withpersonal data

Principles of Personal Data Protection

�e proposed data protection framework has not yet been formalised into law. Hence, the precise provisions which the said laws will entail cannot be ascertained. However, the principles on which the new framework is based are certain. �ese principles embody the

spirit of India’s approach to protecting personal information and upholding privacy. �us, understanding and complying with these principles is at the heart of good data protection practices. Failure to comply with these principles will leave you open

to penal consequences involving substantial �nes and imprisonment.

Following are the key principles enshrined in the proposed framework.

Figure 5 : Principles of personal data protection

Principles ofPersonal Data

Protection

52 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Lawful purpose

�e law requires that processing of personal data can only be done for lawful purposes. While the term is not de�ned, in common parlance, a purpose is lawful unless it is forbidden by law or is expected to result in injury to a person or property.

Notice and consent

�e PDP Bill directs data �duciaries to notify data principals, at the time of collection of personal information, about the nature of data collected, purpose for which such data is being collected and the details of persons/entities with whom such data may be shared. It also

requires them to obtain consent from data principals before processing their personal data.

Purpose limitation

�e law prescribes that data may only be processed for the speci�c purpose for which the data principal has consented.

Data minimisation �e law mandates that data collection shall be minimal and limited to the extent that is necessary for the purposes of processing such personal data.

Storage limitation Data �duciaries are barred from retaining any personal data

beyond the period necessary to satisfy the purpose for which it is processed. �ey are obligated to delete the personal data at the end of the processing.

Data Quality Data �duciaries must ensure that personal information is complete, accurate, updated and not misleading.

Accountability

�e proposed framework not only prescribes measures to be undertaken to protect personal data, but it also holds data �duciaries accountable for ensuring that the provisions are duly implemented in a demonstrable way.

Lawful Purpose Are you sure that you are not processing for a purpose that is forbidden by law or could potentially cause injury/harm to an individual or community?

Notice and Consent Have you informed the data principal about the data collected, purpose of collection and persons/entities that you share data with? Have you obtained consent for this?

Purpose Limitation Ensure that you only process data for the purpose informed to the data principal

Data Minimisation Ensure that you only collect the amount needed for the purpose you have stated and not more

Storage Limitation Do you have a protocol to ensure that data is only used for the purpose

and the time stated, and data is deleted subsequently?

Data Quality Do you have a veri�cation mechanism to ensure that data is complete, accurate and updated?

Accountability If you are a data �duciary (See glossary for de�nition), do you ensure that the above principles are practised at data collection and processing?

Subsequent sections of the handbook seek to explain these principles with emphasis on their implementation while processing personal data.

53 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Personal Data

54 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Introduction

�e proposed data protection framework provides for greater compliance and accountability mechanisms, enforced through stricter penalties. However, the scope of the said framework is limited to personal data. Hence, determining whether you are processing personal data or not is key to determining whether your activities fall under the ambit of data protection law.

�e PDP Bill de�nes personal data as -

data about or relating to a natural person who is directly or indirectly identi�able, having regard to any characteristic, trait, attribute, or any other feature of the identity of such natural person, whether online or o�ine, or any combination of such features with any other information, and shall include any inference drawn from such data for the purpose of pro�ling.

Based on the de�nition, there are two elements of personal data:

I. It is about an individual or relates to an individual.

II. �e said data can directly or indirectly identify the individual.

In this chapter, we seek to break down the de�nition of personal data into individual elements

At a glance

Understanding whether you are processing personal data is critical to understanding whether the provisions of Personal Data Protection law apply to your activities.

Personal data is information that relates to an identi�ed or identi�able individual.

What identi�es an individual could be as simple as a name or a number or could include other identi�ers such as an IP address or a cookie identi�er, or other factors.

If the data you are processing is capable of identifying an individual directly, it is personal data.

If an individual is not directly identi�able, then you need to consider whether the individual is identi�able by using all the means reasonably likely to be used by you or any other person to identify that individual.

It is possible that the same information is personal data for one �duciary’s purposes but is not personal data for the purposes of other �duciary.

In general terms, personal data includes any information about an individual which differentiates her/him from the rest of the world. It may include a variety of information about an individual like name, address, phone number, blood type, photos, �ngerprint, bank account number, etc.

Under PDP Bill, the standard of determining whether data is personal relates to an identi�ed or identi�able natural person.

An individual is said to be ‘identi�ed’ or ‘identi�able’ if he/she can be distinguished from other individuals. Hence, any data that can either be directly or indirectly used to identify a certain individual is personal data.

Information that can identify individuals are called identi�ers. Identi�ers like name, e-mail address or Aadhaar number can directly distinguish an individual from the rest. Other identifiers like

age, gender and zip code can indirectly identify an individual. Different identifiers taken together may also accurately identify an individual. �e de�nition speci�es that these identi�ers can be online (information pertaining to applications, protocols, tools and devices used by an individual like IP addresses, MAC addresses, cookies, pixel tags, account handles, device �ngerprints, etc.) or o�ine in nature.

Examples of Personal Data

Where they relate to an individual, the following data will be considered personal data:

• Name, pseudonym, date of birth

• Postal address, phone number, e-mail address, IP address

• Password, �ngerprint, retinal print

• Aadhaar number, Permanent Account Number (PAN)

• Photos, video, sound recording

• Bank account details, vehicle registration number

• CCTV footage identifying a person’s face

55 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

What is Personal Data?

If you can distinguish an individual by just looking at the information you are processing (for example, name, address, Aadhaar number), you have directly identified the individual. On the other hand, you can also identify a person using secondary information like passport number or car registration number. �is sort of identi�cation is called indirect identi�cation. Any information, which can identify a person, whether directly or indirectly, is personal data.

56 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

For the data to be considered personal, it not only has to identify an individual, but also be about or related to the individual in some way. In common terms, data relates to an individual if it is about that individual. In such cases, the relationship is easily established.

In order to determine whether data relates to an individual or not, the following factors may be considered:

a. Content of the data:

Data may relate to an individual if the underlying information is obviously about an individual. For example, medical records or criminal records of an individual. Alternatively, data may also be related to

an individual if the underlying information is not about the individual per-se but is linked to their activities. Examples of such data include itemised telephone bills or bank records.

b. Purpose of processing:

If the data is likely to be processed to learn/evaluate an individual or make a decision about an individual, then such data will be considered as personal data. Consider the phone records of a desk phone in a workplace. While the records, per se, do not relate to an individual if they are processed to ascertain the attendance of the employee manning the said desk. Such phone

records will then assume the character of personal data.

c. E�ect of processing on the data principal:

If the processing of data has a resulting impact on the individual, such data will be considered personal data even if the content of the data is not directly related to the individual. For example, a factory records data regarding operation of a machinery. �is data per-se does not relate to an individual. However, if the same data is processed to assess the performance of the person operating such machinery based on which his salary will be computed, such data will be considered personal data.

How to determine whether data ‘relates to’ an individual?

What is meant by direct or indirect identification?

Please refer to the �owchart below:

Upon assessing the data you are processing against the aforementioned metrics, if you are still unsure whether the nature of the data is personal or not, it is advisable to treat the data as personal and handle it with the highest standard of care. �is will hedge you against punitive action caused by misidenti�cation of the nature of data.

Is it aboutan individual?

No No No

No

Yes YesYes

Yes Yes

Yes

Treat it asPersonal Data

Check the contentof the data collected

Does it relate tothe individual?

Will processing the datahave any resulting Impact

on the individual?

Do notconsider it asPersonal Data

Is the data is likely to be processed tolearn/evalutate an individual or make a

decision about the individual?

57 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

How to ascertain whether the data is personal?

Figure 6: Flowchart to assess the nature of data

What should you do if you are unable to determinewhether data is personal or not?

58 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Some identi�ers are classi�ed as more sensitive than others. �is class of data is known as sensitive personal data and is treated as a special category. It must be given extra protection because due to its nature, breach of such data can cause irreparable harm to the person to whom it relates. �e Bill provides a non-exhaustive list of identi�ers that are considered sensitive. �ese include - �nancial data, health data, biometric data, genetic data, sexual orientation, and religious or political beliefs among others.

Another categorisation provided under the Bill is that of ‘critical personal data’. While the term has not been de�ned in the Bill, it has been accorded maximum sensitivity under law. �ere is prohibition on processing of critical personal data outside India.

Sensitive personal, critical personal and personal data must be held separately from each other.

Are there other categories of personal data?

Checklist for Developers

Personal Data

SensitivePersonal Data

Are you able to distinguish between the di�erent categories of data - personal data, sensitive personal data, non-personal data, etc.?

Consider creating a referencer distinguishing data into categories for the kind of data you as a startup usually deal with.

Do you understand how to distinguish data that is personally identi�able and data that is not based on content of the data, purpose of processing and e�ect of processing ?

Do you use technical measures to secure personal data, sensitive personal data and other data that may be critical, such as anonymisation, pseudonymisation and encryption?

59 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Data Fiduciaries and Data Processors

�e PDP Bill de�nes two entities that deal with personal information, viz. data �duciary and data processor. �e law prescribes varying levels of liability associated with the two

entities. Data �duciaries shoulder the bulk of compliance requirements prescribed under law and are obligated to demonstrate their compliance of the data

protection principles. Data processors, on the other hand, have limited compliance burden. �e following table maps the liabilities associated with the two entities.

Much like determining whether the data you are processing is personal in nature or not, it is critical to ascertain whether you are a data �duciary or a data processor. As stated before, your compliance burden and other liabilities under law will change as per your status as a data �duciary or data processor.

Introduction

Data Fiduciary

LiabilityPrinciple/Provision

Data Processor

Notice and Consent

Security Safeguards

Data quality

Reporting Data Breaches

Maintenance of Records

Audit

Impact Assessment

At a glance

Determining whether you are a data �duciary or a data processor is critical to understand your liabilities under the PDP Bill. �e Bill holds data �duciaries liable for complying with all provisions while processors have limited liabilities.

Data �duciaries are entities that determine the purpose and/or means of collection and processing of personal data.

Data processors are entities that process personal information on instructions of the data �duciaries.

60 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Data �duciary is any entity (person, government agency, company/�rm/trust) which determines the purpose and/or means of processing of personal data. Data �duciaries are the main decision makers and exercise complete control over what data to process and why to process it.

For example – Bank ‘X’ appoints a technology company ‘Y’ to assess the risk of delinquency associated with certain accounts. In this case the bank is the data �duciary as it controls the purpose and means of data processing.

Who is a Data Fiduciary?

A data processor is an entity which processes personal data on behalf of the data �duciary. �e law de�nes processing as any operation performed on personal data including – collection, recording, organisation, structuring, storage, adaptation, alteration, retrieval, use, alignment or combination, indexing, disclosure by transmission, dissemination or otherwise making available, restriction, erasure, or destruction. In the example above, the company ‘Y’ hired by the bank ‘X’ becomes the data processor.

�e relationship between a data �duciary and a data processor is contractual in nature. Clause 31 mandates that data �duciaries shall not engage a data processor without a contract in place.

Who is a Data Processor?

Yes

YesYes

Yes

Yes

YesYes

Yes

No

No

No No

No

No

No

No

You are a datafiduciary

Do you collectpersonal data?

You are a dataprocessor

Do you process the dataonly on someone else’s

instruction

Do you decidewhether to disclosethe data and if so,

to whom?

Do you take a call on howlong to retain the data or

whether to make non-routineamendments to the data?

Do you determinewhat to tell individuals about the processing?

Do you decide thetypes of personal data

to be collected?

Do you determine thebasis of collection of

personal data?

Do you determinewhich entity collects

personal data?

61 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Any entity by its very nature is not a �duciary or a processor. To determine whether you are a data �duciary or a data processor ,with respect to a given processing activity, you need to ascertain whether you have the decision-making power regarding any of the following actions:

• Collection of personal data

• The basis for collecting personal data

• Types of personal data to be collected

• Defining the purpose for which data will be used

• Identifying the individuals about whom the data will be collected

• Whether or not to disclose the data, and if so, to whom

• What to tell individuals about the processing

• How long must data be

retained to make non-routine amendments to the data

If you take a call on any or all of the aforementioned purposes and means of processing, you are a data �duciary. If you do not determine any of the above and only process data based on someone else’s instructions, you are a data processor.

How to determine if you are a data fiduciary or a data processor?

Figure 7 : Steps to determine whether you are a data fiduciary or a data processor

62 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Notice and Consent

Informed consent is at the heart of India’s data protection framework. �e law prohibits processing of personal data, unless the person whose data is being processed gives consent for processing. It also empowers data principals to withdraw their consent at a subsequent stage. �is allows data principals to exercise control

over their personal data, while also creating scope for businesses to use an individual’s personal data.

�e Bill directs data �duciaries to inform data subjects, at the time of collecting their personal information, particular details pertaining to such data collection. Among other

requirements, these include – the purpose of data collection, the identity and contact details of the data fiduciary, the period for which such data shall be stored and persons with whom such data shall be shared. The bill also prescribes the standards of valid consent which are mentioned below:

What is the principle of notice and consent?

At a glance

Informed consent is at the heart of India’s data protection regime.

Data �duciaries have to notify data subjects and provide them with relevant information at the time of collection of their personal information.

It is mandatory to obtain the principal's consent before their data is collected and processed. Data processing without obtaining consent of the data principal is prohibited by law and is punishable.

Such consent must be validly obtained and must be free from coercion, undue in�uence, misrepresentation and fraud.

Data principal has the right to withdraw his/her consent at a later stage.

Clause 11(2) of the PDP Bill clari�es that consent will be considered valid only if:

• It is obtained out of free will of the data principal i.e., the data principal was not forced or misguided to give consent.

• It is informed, i.e., the data principal has been made

aware of the nature of data collected, purpose of collection, name of entities with whom the data may be shared, period of retention, process for withdrawal of consent, etc., through a proper notice as mandated by Clause 7.

• It is speci�c, i.e., the data principal should be able to determine the scope of consent in respect of the purpose of processing.

• �e consent is capable of being withdrawn by the data principal.

63 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

How do you implement the principle?

What is the standard of valid consent?

• the purposes for which the personal data is to be processed;

• the nature and categories of personal data being collected;

• the identify and contact details of the data �duciary and the contact details of the data protection officer, if applicable,

• the right of the data principal to withdraw his consent, and the procedure for such withdrawal;

• the source of such collection if the

personal data is not collected from the data principal;

• the individuals or enttitles including other data �duciaries or data processors, with whom such personal data may be shared, if applicable;

• information regarding any cross-border transfer of the personal data that the data �duciary intends to carry out, if applicable;

• the period of which the personal data shall be retained.

CALL to ACTIONIf you are collecting personal data from an individual, you must mandatonly notify him/her about the following:

Are there exceptions to the principle of notice and consent?

64 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Figure 8 : Exceptions to the principles of notice and consent

Exceptions to the principleof Notice & Consent

Where necessary for purposeof employment

Reasonable purposesas specified

Checklist for Developers

What personal data do you hold? Where did it come from? Who do you share it with?What do you do with it?

Have you identi�ed the lawful basis for processing data and documented this?

Do you have a protocol to review how you ask for data and record consent from data principals?

While notifying data principals, have you or your organisation ensured that the notice is clear, concise, easy to understand and provides the data principal with all relevant information as described under Clause 7 of the PDP Bill? Examples of information notices can be seen here.

Prevention and detection of any unlawful activity

Whistle blowing

Mergers and acquisitions

Network and information security

Credit scoring

Recovery of debt

Processing of publicly available personal data

Operation of search engines

Recruitment or Termination ofData Principal

Any service or benefit sought by Data Principal

Attendance of Data Principal

Assessment of Performance of Data Principal

Where necessary forperformance ofgovernment functions

65 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Purpose Limitation

�e principle of purpose limitation is given under Clause 5(b) of the PDP Bill, which states:

Every person processing personal data of a data principal shall process such personal data

(b) for the purpose consented to by the data principal or which is incidental to or connected with such purpose, and which the data principal would reasonably expect that such personal data shall be used

for, having regard to the purpose, and in the context and circumstances in which the personal data was collected.

In simpler terms, the principle states that personal data collected for one purpose should not be processed for a different purpose. It is key to note that the law leaves a certain wiggle room for data processors in this regard by allowing processing for purposes incidental to the primary purpose for which consent was obtained.

However, processing of personal data which is completely unrelated to the stated purpose is prohibited.

According to the law, the purpose for which data is being collected must be speci�ed at the time of collection of such data. Specifying the purpose at the outset allows data principals to furnish informed consent. It also helps data �duciaries to remain accountable and avoid function creep.

What is the meaning of purpose limitation?

At a glance

�e purpose for which you are processing personal data must be clear at the outset and must be communicated to the data principal.

Personal data collected can only be processed for the speci�ed purpose.

Use of personal data for unspeci�ed purposes is not completely prohibited. Personal data may be used for purposes incidental to or connected with the speci�ed purpose. It may also be used for activities which the data principal may reasonably expect their data to be used for.

66 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

�e purpose of data collection should be speci�ed before collecting the personal information of an individual. �e purpose of data collection is a part of the mandatory noti�cation requirement speci�ed under Clause 7 of the PDP Bill.

How do you specify your purpose?

�e law does not ban the use of personal data for unspeci�ed purposes altogether. Clause 5 clari�es that personal data may be used for purposes incidental to or connected with the purpose speci�ed. Activities which the data principal may reasonably expect their data to be used for are covered within the ambit of permitted purpose. In simpler terms, personal data may be used for purposes compatible with the speci�ed purpose.

In order to assess the compatibility of purpose

vis-a-vis the originally speci�ed purpose, you may consider the following:

• Link between the original purpose and the new/upcoming purpose – In simple terms, greater the distance between the originally speci�ed purpose and the new purpose, weaker the link between the two and consequently, the more problematic it is in terms of compatibility.

• Context in which the personal data has been

collected – You may need to look at the nature of the relationship between the data �duciary and the data principal. What you are expected to assess is the balance of power between the two, based on the consideration of what would be the commonly expected practice in the given relationship.

• Nature of the personal data – Generally, the more sensitive the information involved, the narrower the scope for compatible use.

Can the data be used for purposes other thanthe ones specified?

You are mandatorily required to specify the purpose for which the personal data of an individual is being collected, before collecting such data.

CALL to ACTION

67 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

• Possible consequences for data principal – You should assess the possible impact (both positive and negative) the subsequent processing may have on the data principals. Generally, risk of detriment to data principal is inversely proportional to compatibility of purpose, i.e., greater the risk of negative impact on the data principal, lesser are the chances of the purpose being compatible.

• Existence of appropriate safeguards – Appropriate security safeguards like encryption and pseudonymisation can offset de�ciencies in other criteria

mentioned above. For example, safeguards ensure safer processing and mitigate risk of negative impact of processing on data principals.

A bank collects �nancial records of a person to assess his credit worthiness and offer him a personal loan. Next year, the bank processes the same data for assessing his eligibility for increased credit, without seeking his consent for the subsequent processing. �is processing is allowed because the new purpose is compatible with the original purpose.

However, if the purpose is incompatible with the speci�ed

purpose, processing of personal information for such purposes is prohi

�e same bank wants to share the client’s data with insurance �rms. �at processing isn’t permitted without the explicit consent of the client as the purpose isn’t compatible with the original purpose for which the data was processedbited.

Checklist for Developers

Do you have a clear understanding of the purpose for which data may be used?

Do you have a mechanism to ensure that this purpose is clearly represented to principals you collect data from?

If there is a change in how you use data, do you have a procedure to inform ‘data principals’ about this?

68 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Data Minimisation

�e principle is enshrined in Clause 6 of the PDP Bill. It states that -

The personal data shall be collected only to the extent that is necessary for the purposes of processing of such personal data.

Data minimisation refers to the practice of limiting the collection and processing of personal information to what is directly relevant and necessary

to achieve the speci�ed purpose. In other words, personal information should only be processed in situations where it is not feasible to carry out the processing in another manner.

A study of businesses in the UK, France and Germany revealed that 72 per cent respondents gathered data they did not subsequently use7.

What is the principle of data minimisation?

At a glance

Personal information should be processed only when absolutely necessary.

You must ensure that the personal data you are processing is directly relevant and necessary for achieving the stated purpose.

Relevance and necessity are not de�ned by law. It varies as per purpose. Hence clarity of purpose is the key to determining relevance and necessity.

7 http://info.purestorage.com/rs/225-USM-292/images/Big%20Data%27s%20Big%20Failure_UK%281%29.pdf?aliId=64921319

�e law does not de�ne the parameters or threshold of relevant and necessary personal information.�e relevance and necessity depend on the purpose for which information is being collected. For example, consider a database consisting of name, age, gender, educational quali�cation and caste of certain individuals. Now, if a company uses the database to shortlist candidates for a job position, �elds like name, age and educational quali�cation are relevant and necessary but gender and caste information are not. On the other hand, if the database is used for identifying people for vaccination, only name and age remain relevant, while educational quali�cations, caste, and gender become irrelevant.

�e following �owchart shows how a data �duciary can decide what to collect and what not to collect.

Source: A Technical Look at the Indian PDP Bill

*PA-PD means pseudonym associated personal data

*PI-PD means personal identi�able personal data

*SPI-PD means sensitive personal identi�able personal data

69 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

How do you determine what is relevant and necessary?

Figure 9 : Data Collection Minimisation using Flow Chart

Can service beprovided without

collectingpersonaldata?

Can servicebe provided topseudonym

users?

Servicerequired to

collect personaldata?

Servicerequired to

collect sensitivepersonaldata?

Collect PersonalIdentifiable-

Personal Data likename, DOB etc. (PI-PD)

Collect data ofpseudonym user’s

(PA-PD), e.g.cookies, font size etc.

Yes

YesNo No

No

Yes

Yes0-collection

No personal dataCollection

Collect SensitivePersonal IdentifiablePersonal Data like

password, credit card,PAN details (SPI-PD)

70 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Is data minimisation principle antithetical to AI?

Arti�cial intelligence and machine learning are data intensive technologies. Especially in the case of AI, the accuracy of an algorithm is often directly proportional to the volume of data it is trained on. Hence, at the �rst glance, application of the principle of data minimisation may seem

particularly challenging. However, that may not be the case. Data minimisation does not bar processing of large volumes of data. It only mandates that the data must be directly relevant and necessary for the purpose for which it is processed. Further, the principle is only applicable to

‘personal data’ and not non-personal data. �e spirit of the law is to protect the privacy of individuals. Hence developers can always use technological methods to protect the privacy of individuals in order to ensure that they do not violate the law.

Periodically assess whether the personal information you hold is necessary and relevant to your purpose. If not, it is advisable to delete the data you no longer needed.

Clearly determine your purpose in order to determine whether you are holding an adequate amount of personal information

CALL to ACTION

Are there legal provisions protecting big-data applications?

�e PDP Bill contains a provision through which AI or other big-data applications may be exempted from the ambit of the principles of data protection. Clause 38 of the Bill authorises relevant authority to exempt certain classes of research, archiving, or statistical purposes, where processing of personal information is necessary, from the application of any of the provisions of the Bill. In determining such exceptions,

the authority must be satis�ed that:

1. �e compliance with the provisions of the law shall disproportionately divert resources from such purpose;

2. �e purposes of processing cannot be achieved if the personal data is anonymised;

3. �e data �duciary has carried out de-identi�cation in accordance with the code of practice speci�ed under

section 50 and the purpose of processing can be achieved if the personal data is in de-identi�ed form;

4. �e personal data shall not be used to take any decision speci�c to or action directed to the data principal; and

5. �e personal data shall not be processed in the manner that gives rise to a risk of signi�cant harm to the data principal.

What data minimisation techniques are available for AI systems?

Supervised Machine Learning algorithms rely on large amounts of data. �is data is used in two phases -

a. Training phase - where data is used to create a machine learning algorithm; and

b. Inference phase - where a trained machine learning algorithm is used to make predictions.

At each stage, distinct techniques exist to either minimise the use of personal data or ensure that privacy of individuals is preserved.

Training Phase

• Feature Selection –�is involves segregation of relevant features from irrelevant features. A dataset may contain a large number of features or data �elds - like name, age, gender, pin code. However, not all features of a dataset may be relevant to your purpose. Consider a dataset containing features like name, age, gender, address, blood group, sexual orientation and political affiliation. If this data set is to be used to train a ML

model for assessing credit risk, features like sexual orientation, political affiliation and blood group are irrelevant. In order to segregate relevant features from the irrelevant features, standard methods of feature selection may be used.

• Privacy Preserving Methods – A range of privacy preservation techniques may be used to minimise data processing and preserve individual privacy. �ese include:

• Perturbation - �is involves adding noise to data by

71 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

modifying data points in order to ensure that information cannot be traced back to speci�c individuals.

• Federated Learning - Contrary to the standard approach of centralising training data, federated learning uses a decentralised approach where ML models are trained on edge devices (like mobile phones). In the process, the data remains on local devices, thus mitigating privacy risks.

• Synthetic Data - Synthetic data is arti�cially generated data which does not relate to real people or events. It is used as a stand-in for real datasets and possesses the same statistical properties as the dataset it replaces. �us, when used as a training set, it performs like that real-world data would.

Inference Phase• Converting personal data into

less ‘human readable’ format - ML models usually require the full set of predictor

variables for an individual to be included in the query in order to make a prediction about the individual. Personal data in a query can be converted into abstract formats which are less interpretable by humans. For example, in facial recognition technology, images of the faces are converted into mathematical representations of the geometric properties of the faces. �ese are called 'faceprints'. Instead of sending the image itself to a model for prediction/classi�cation, the image can be converted to a faceprint, making it less identi�able.

• Making inferences locally - Similar to federated learning, where ML models are trained on local devices, inferences may also be made locally. �is can be achieved by hosting the ML model on the device generating the query. An example of this approach is Privad, an online advertising

system which seeks to preserve user privacy by maintaining user pro�le (for predicting which types of ads a user will prefer) on the user’s device instead of the cloud.

• Privacy preserving queries - In cases where local hosting of ML models is not feasible, techniques allowing minimisation of data revealed in a query sent to a model may be explored. Techniques like Trustworthy privacy-aware participatory sensing (TAPAS) allow retrieval of prediction or classi�cation from a model without compromising the privacy of the user generating the query.

Please refer to Annexure - A for further details on technical measures that can be deployed to ensure data minimisation and privacy preservation.

Checklist for Developers

Do you have a quality check on datasets you collect, in order to remove excess data that you may not need ?

Do you systematically delete data that you do not need anymore, even though you may have lawful consent to keep data for longer ?

72 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Storage Limitation

At a glance

You must not keep personal data for longer than you need it.

You need to think about – and be able to justify – how long you keep personal data. �is will depend on your purposes for holding the data.

You should also periodically review the data you hold, and erase or anonymise it when you no longer need it.

Individuals have a right to erasure if you no longer need the data.

You can keep personal data for longer if it is mandated by law or is explicitly consented to by the data principal.

Even if personal information is collected and used in compliance with all the norms, personal information cannot be stored inde�nitely. Personal information, which is not necessary or needed, must be deleted by data �duciaries. �is principle is enshrined in Clause 9 of the PDP Bill, which states:

(1) �e data �duciary shall not retain any personal data beyond the period necessary to satisfy the purpose for which it is processed and shall delete the personal data at the end of the processing.

(2) Notwithstanding anything contained in sub-section (1), the personal data may be retained for a longer period if explicitly consented to by the data principal, or necessary to comply with any obligation under any law for the time being in force.

(3) �e data �duciary shall undertake periodic review to determine whether it is necessary to retain the personal data in its possession.

(4) Where it is not necessary for personal data to be retained by the data �duciary under

sub-section (1) or sub-section (2), then, such personal data shall be deleted in such manner as may be speci�ed by regulations.

Storage limitation is closely related to the data minimisation principle. Deleting personal data which is no longer necessary reduces the risk that it becomes irrelevant, excessive, inaccurate or out of date. On the practical side, storing personal data indefinitely is an inefficient practice, given the costs of storage and security of such data.

What is the storage limitation principle?

73 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

�e law does not prescribe a time-limit in this regard. It merely states that personal data must not be stored beyond the period necessary to satisfy the purpose for which it is processed. Once again, the necessity of retention will depend on the purpose for which the data was collected and processed.

Consider the CCTV system installed at an ATM facility. Its purpose is to prevent fraud. Since a dubious transaction may not come to light immediately, banks need to retain the CCTV footage for a long period of time. On the other hand, a restaurant may need to retain its CCTV footage only for a shorter

period of time as any untoward incident is immediately identi�ed.

In any case, data �duciaries are mandated to inform the data principal at the time of collecting his/her personal information as to how long his/her personal information will be stored.

For how long can personal data be stored?

Sometimes, data retention is mandated by law. In India, laws like Companies Act, 2013, SEBI Disclosure Regulations, 2015, labour laws and Income Tax Act, 1961 prescribe data to be retained for a speci�ed period. Such provisions will

override the necessity of purpose. In other words, if the prescribed period under any law exceeds the period for which you need to retain data, the data must be stored for the prescribed period.

Further, if the data principal consents to storage of his/her personal information for a longer period of time, such personal information may be stored beyond the period of necessity.

When should you review the necessity of retention?

74 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

�e law does not prescribe the period after which data �duciaries must review their retention. �e need for retention of personal data must be reviewed at the end of any standard retention period. Unless there is a clear justi�cation of keeping

personal data for a longer period, it must either be deleted or anonymised. In the absence of a set retention period, it is ideal to review your need of retaining personal data regularly, subject to resources and the associated privacy risks.

You are also obligated to review whether you still need personal data if an individual exercises his right to erasure. �e right to erasure is discussed in detail under the chapter called Rights of Data Principals.

When can personal data be stored for a longer period of time?

Checklist for Developers

Do you have a mechanism to remind you that the data retention period has lapsed or is about to lapse?

Have you considered a mechanism to seek a renewal on data retention period, in case you need data for longer than the data retention period?

Do you have a mechanism to remove data if the data principal has asked for the data to be removed/ forgotten?

75 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

76 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Data Quality

In common parlance, data is considered high-quality if it is �t for its intended uses in operations, decision making and planning. From a technical perspective, quality of data depends on certain characteristics. �ese are:

What is the principle of data quality?

�e principle of data quality seeks to ensure that personal information collected and processed by data �duciaries should be high-quality. It implies that the personal data being used should be relevant to the purpose for which it is to be used and should be accurate, complete and kept up-to-date. �is is a good practice in terms of information management, but it is also linked to the rights of the individuals.

Experts suggest that “the use of low-quality, outdated, incomplete or incorrect data at different stages of data processing may lead to poor predictions and assessments and in turn to bias, which can eventually result in infringements of the fundamental rights of individuals or purely incorrect conclusions or false outcomes”8.

Further, poor data quality also poses �nancial challenges. Studies reveal that 60 percent of data scientists spend most of their time cleaning and sorting data.9 In fact, poor quality is generally regarded as the biggest obstacle to monetising data.10

Characteristic How it's measured

Accuracy Is the information correct in every detail?

Completeness How comprehensive is the information?

Reliability Does the information contradict other trusted resources?

Relevance Do you really need this information?

Timeliness How up-to-date is the information? Can it be used for real-time reporting?

At a glance

You should take all reasonable steps to ensure the personal data you hold is not incorrect or misleading as to any matter of fact.

You may need to keep the personal data updated, although this will depend on what you are using it for.

If you discover that personal data is incorrect or misleading, you must take reasonable steps to correct or erase it as soon as possible.

8 https://www.europarl.europa.eu/doceo/document/A-8-2019-0019_EN.html#title49 https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year10 https://www.wsj.com/articles/ai-efforts-at-large-companies-may-be-hindered-by-poor-quality-data-11551741634

77 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

PDP Bill does not de�ne the meaning or parameters of accuracy of data. However, Clause 8 of the Bill speci�es that accuracy, and how up-to-date the data is, can be linked to the purpose for which it is being processed. For example, let's consider an individual ‘X’ who moves his house from New Delhi

to Mumbai. However, in a certain dataset, X’s city of residence is indicated as New Delhi. Now, if the dataset is to be used by a courier service for the purpose of delivering a product, the information is obviously inaccurate. However, if the same dataset is being used by the

Government of Delhi to identify individuals who have resided in New Delhi in the past, the data will be accurate. Hence,you must always be clear about what you intend the record of the personal data to show? What you intend to use it for may affect whether it is accurate or not.

When is data accurate/inaccurate?

Personal data is intrinsically linked to individuals, who are therefore the most reliable source of data. �e primary responsibility to provide accurate data to the data �duciary ideally rests on the data principal. However, there is a corresponding obligation to ensure that data is complete, i.e., it will satisfy the purpose for which it was collected on the data �duciary who is collecting such data. In general, it is a good practice to ensure that the information as well as its source is accurately captured. Clause 8

holds data �duciaries responsible for maintaining the quality of personal data being processed. It states that data �duciaries must take necessary steps to ensure that the personal data processed is complete, accurate, not misleading and updated. However, it does not de�ne what these necessary steps must be. What constitutes a ‘necessary step’ will depend on the nature of the data and the purpose for which it is being used. If you are using the data for a purpose where the accuracy of the data is of

essence, you must put in more effort to ensure that the data is accurate. In cases where data is used to make decisions that may significantly affect the individual concerned or others, ensuring accuracy will require extra efforts. For example, if data is being collected for the purpose of contact tracing in order to arrest the spread of an infectious disease, ensuring the accuracy of information like names and addresses becomes critical.

Who is responsible for maintaining data-quality?

Checklist for Developers

Have you considered a mechanism to �lter data at the stage of receipt of such data and remove data that is incomplete, inaccurate or unnecessary?

Do you have a protocol for quality control if you are not the entity collecting data?

Do you have a protocol for quality control over public data and/or government data?

Do you take proactive steps to account for and address the margin of error or inaccuracy in the data you collect?

Accountability and Transparency

78 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Data protection laws are designed to protect individuals from the excesses of this power imbalance. Notice and consent were the traditional instruments used by law to achieve this objective. It offered the individual the autonomy to decide whether or not to allow their data to be processed after providing their full knowledge of what was going to be done with that data. However, abundance of online services and the increasing complexity of data use cases have made

notices lengthier and more complex, making it difficult for users to comprehend. �is weakens the purpose of informed consent. To offset these issues, key principles that have emerged are that of accountability and transparency.

While accountability is in-built in extant legal provisions concerning data governance, the PDP Bill takes it a step ahead. It mandates businesses to take a proactive, systematic

and answerable attitude toward data protection compliance. In this sense, accountability shifts the focus of privacy governance to an organisation’s ability to demonstrate its capacity to achieve speci�ed privacy objectives. �us, there are two discernible elements of accountability under the proposed framework. Firstly, it makes clear that you(data �duciary) are responsible for complying with the law; and secondly, you must be able to demonstrate your compliance.

Introduction

At a glance

Data �duciaries are responsible for complying with the provisions of the PDP Bill.

Accountability goes beyond mere compliance and includes demonstrability of such compliance.

Accountability obligations are ongoing. You must review and, where necessary, update the measures you put in place.

�ere are a number of measures that can be taken:• Taking a ‘data protection by design’ approach• Implementing appropriate security measures• Recording and, where necessary, reporting personal data breaches• Carrying out data protection impact assessments for uses of personal data that are likely to result in

high risk to individuals’ interests• Appointing a data protection officer

Proactive not reactive-preventative not remedial

Articipate, Identify, and PreventInvasive events before they happen,

this means taking action beforethe fact, not afterward.

Embed privacy into desingPrivacy measures should not be

add-ons, but fully integratedcomponents of the system.

Ensure end-to- end securityData lifecycle security means alldata should be securely retained

as needed and destroyed whenno longer needed.

Respect user privacy-keep it user centric

Keep things user-centric individualprivacy interests must be supported

by storng privacy defaults, appropriatenotice, and user-friendly options.

Maintain visibility andtransparency-keep it openAssure stakeholders that businesspractices and technologies areoperating according to objectives andsubject to independent verification.

Retain full functionality(positive-sum, not zero-sum)Privacy by Design employs a “win win”approach to all legitimate system designgoals; that is, both privacy and security areimportant, and no unnecessary trade offsneed to be made to achieve both.

Load with privacy asthe default settingEnsure personal data is automaticallyprotected in all IT systems or businesspractices, with no added acticonrequired by any individual.

79 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Central to this new approach to accountability is the concept of ‘Privacy by Design’.

Privacy by design as a concept was designed in the 1990s by Dr. Ann Cavoukian. It advanced the view that privacy assurance cannot be achieved by mere compliance with regulatory framework. Rather privacy must become an organisation’s default mode of operations. Initially it was limited to embedding privacy enhancing technologies (PETs) into the design of information technologies and systems. Subsequently, it evolved into an all-encompassing approach extending to a “Trilogy” of encompassing applications:1) IT systems; 2) accountable business practices; and 3) physical design and networked infrastructure11. Privacy by Design entails seven foundational principles, which are:-

Privacy by Design

Figure 10 : Seven foundational principles of Privacy by Design

11 https://www.ipc.on.ca/wp-content/uploads/resources/7foundationalprinciples.pdf

80 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Clause 22 of the PDP Bill mandates every data �duciary to prepare a privacy by design policy. �e policy is supposed to describe the managerial, organisational, business practices and technical systems designed to anticipate, identify and avoid harm to the data principal. It should also describe the ways in which privacy is being protected throughout the stages of processing of personal data, and how the interest of the individual is accounted for in each of these stages.

Data �duciaries are expected to get their privacy by design policies certi�ed by the Data Protection Authority. Once certified, the law mandates that the policy be published on the website of the data �duciary.

How would Privacy by Design be Implemented by Law?

Additional Resources

�e following resources can be referred to by organizations and developers to further understand and implement privacy by design:

• Norwegian Data Protection Authority's guidance on how software developers can implement data protection by design.

• A primer on privacy by design published by the Information & Privacy Commissioner of Ontario.

• Guidance on operationalizing privacy by design.

81 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

�e accountability measures mandated under PDP Bill are -

• Identifying accountable entity: Clause 10 of the Bill assigns the liability of complying with the provisions of the law to the data �duciaries.

• Appointing data protection o�cer: Clause 30 of the Bill mandates signi�cant data �duciaries to appoint a Data Protection Officer for overseeing the organisation’s data protection strategy and implementation.

• Mandated security safeguards: Clause 24 mandates data �duciaries and data processors to implement technological security measures, including encryption and anonymisation, to preserve the con�dentiality and integrity of personal data. �e provision also mandates periodic review of such safeguards.

• Data protection impact assessment: Clause 27

mandates signi�cant data �duciaries to undertake a ‘data protection impact assessment’ before undertaking any processing involving new technologies or large-scale pro�ling or use of sensitive personal data such as genetic data or biometric data, or any other processing which carries a risk of signi�cant harm to data principals. �e DPIA must describe the proposed processing operation, the purpose of processing and the nature of personal data being processed. It must also assess the potential harm that may be caused to the data principals, and describe the measures for managing, minimising, mitigating or removing such risk of harm. �e DPIA must then be submitted to the concerned authority, which may cease the processing or issue additional guidelines, if needed.

• Maintenance of Records: Clause 28 mandates signi�cant data �duciaries to

maintain accurate records of operations throughout data’s lifecycle. �e provision also mandates keeping records of the periodic assessment of security safeguards under clause 24 and data protection impact assessments under clause 27.

• Audit requirements: Clause 29 mandates signi�cant data �duciaries to undergo an annual audit of their policies and conduct of their data processing operations. �e audit will particularly evaluate the clarity and effectiveness of notices, effectiveness of privacy by design policy, security safeguards in place and the level of transparency maintained by the data �duciary. Such an audit must be carried out by an independent data auditor. Post audit, auditors may assign a rating in the form of a data trust score to the data �duciary.

What are accountability measures prescribed by law?

82 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Transparency is fundamentally linked to fairness. Transparent processing is about being clear, open and honest with people from the start about who you are, and how and why you use their personal data. Transparency measures mandated under the PDP Bill are:

• Privacy disclosures: Clause 23 mandates every data �duciary to disclose the following information at large:

a. �e category of personal data collected and manner of such collection.

b. Purpose for which personal data is generally processed.

c. Data processing operations posing risk of signi�cant harm.

d. Rights of data principals and the manner in which such rights can be exercised.

e. Contact details of persons responsible for facilitating exercise of the rights of data principals.

f. Data trust score earned by the data �duciary after audit.

g. Information regarding cross-border transfers of personal data that the data �duciary generally carries out (if applicable).

• Reporting of data breaches: Data �duciaries are mandated to report, as soon as possible, any personal data breach to the relevant authority. Clause 25 states that data �duciaries must notify the authorities about the nature of personal data breached, number of data principals affected, possible consequences, and action being taken by the data �duciary to remedy the breach.

What are the transparency requirements prescribed by law?

Figure 11 : Transparency measures under PDP Bill

Category and manner of collection

Purpose

Probable risk of significant harm

Rights of Data Principles

Reference of Concerned person(in the event data principles exercise their rights)

Data Trust Score of Data Fiduciary

Information about cross border transfer of personal data

Privacy disclosures(Clause 23)

Reporting of data breaches(Clause 25)

Transparency Measuresprescribed by law

• Nature of personal data breached

• Number of data principals affected

• Possible consequences

• Action being taken by the data �duciary to remedy the breach

CALL to ACTIONIf you are a data �duciary and there is any personal data breach, you are mandated to notify the authorities about the following:

83 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Checklist for Developers

Are you aware of your organisation’s internal data protection policy?

Do you have technical mechanisms to monitor compliance with data protection policies, and regularly review the e�ectiveness of data handling and security controls?

Has your startup implemented technical and organisational measures to integrate data protection practices at each stage of the processing activity?

Does your startup conduct Data Protection Impact Assessments? Is there a protocol for this?

Have you nominated an administrative o�cer to implement data protection principles or a Data Protection O�cer?

One of the key objectives of the PDP Bill is to empower individuals, by allowing them a greater degree of control over their personal information, to protect their fundamental right to privacy. Towards this, the Bill guarantees certain statutory rights to individuals.

It is important to understand the nature and scope of these rights as it corresponds to obligations attached to them, which have to be ful�lled by you, in your capacity, as data �duciary/processor. Further, some of these rights also overlap with the principles of data protection discussed in previous chapters.

Rights of Data Principal

Right to confirmation and access (Clause 17):

the right to obtain a copy of their personal data, as well as other

supplementary information

Right to data portability (Clause 19): the right to receive personal data

they have provided to a controller and to request transmission of the data from one data fiduciary

to another Right to correction and erasure (Clause 18): the right to have personal

data erased or inaccurate personal data

recified.[Not an absolute right]

Right to forgotten (Clause 20):

the right to restrict or prevent the continuing

discolusre of her personal data

84 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Rights of Data Principals

Figure 12 : Rights of Data Principal

85 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Possible implementation example for developers

Provide a functionality to display all data relating to a person. If there is a lot of data, you can split the data into several displays. If the data is too large, offer the person to download an archive containing all his or her data.

Right to Confirmation and Access

Individuals have the right to obtain a copy of all the information you have about them. �ey are also entitled to obtain a readable copy of their data in an understandable format. �is is called the right to con�rmation and access. It is contained under Clause 17 of the Bill. As the name suggests, it consists of two distinct rights:

1. Right to con�rmation -A data principal has the right to obtain a con�rmationfrom the data �duciary on whether his/her personal data is under processing or has been processed in the past.

2. Right to access -�e data principal is entitled to access: a) his/her data which is under processing or

has been processed; b) summary of processing activities where the data has been used; and c) identity of other data �duciaries with whom his/her data has been shared. It helps individuals to understand how and why you are using their data, and check if you are doing it lawfully.

Consider your favourite social media platform. Let’s call it ABC. ABC collected personal information, like your name, date of birth, relationship status, etc., from you at the time of creating the account. It also keeps learning about your likes and dislikes by analysing your engagement with the platform. �is data is personal as it relates to you and can be used to identify you. In this

example, you are the data principal and ABC is a data �duciary. You have the right to ask ABC whether or not your data has been processed. Processing includes any operation, ranging from collection, structuring and storage to retrieval, sharing and disposal of the data.

You can also request ABC for a copy of all the datathat they have on you, including inferences drawn using such data. You are also entitled to know where your data is shared and which other entity or individual has your data . ABC will be required to supply this data on your request and must be in a format that is easy to read and understand for you.

Source: CNIL. Sheet n13, Prepare for the exercise of people’s rights

86 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Right to Correction and Erasure

Clause 18 allows individuals the right to have their inaccurate personal data recti�ed, incomplete personal data completed, out-of-date personal information updated and have their personal data erased if it is no longer necessary for the purpose it was collected.

�e right to correction has close links to the data quality principle, as it bolsters the accuracy of personal data. As essential services like banking and insurance go online, the right to correction assumes more importance. �is is because inaccurate or incomplete information can lead to denial of such services.

�e right to erasure, on the other hand, allows an individual to get his data deleted, if it is no longer necessary to the purpose of its

processing. For example, an individual provides his bank account details and list of assets to a bank for a loan. However, his application was denied. Since the bank no longer needs the data, the concerned individual has the right to ask the bank to delete his data.

However, it is important to note that the right to correction and erasure is not absolute and is subject to the purpose of processing. Data �duciaries can reject a request for correction or erasure. However, they will have to furnish an adequate justi�cation for such refusal/disagreement in writing. If the data principal is not satis�ed with the explanation provided, they may require the data �duciary to indicate that the personal information in question has been disputed by the data principal.

If the data �duciary makes the suggested correction/ updation/erasure, it will also have to notify all relevant entities or individuals to whom such personal data may have been disclosed.

Possible implementation example for developers(i) Provide a functionality to erase all data relating to a person.(ii) Provide for automatic noti�cation of processors to also erase the data relating to that person.(iii) Provide for data erasure in backupsor provide an alternate solution that does not restore erased data relating to that person.(iv) Allow to directly modify data in the user account.

Source: CNIL. Sheet n13, Prepare for the exercise of people’s rights

Inaccurate Information can lead to denial of services

Let's assume that the government has organised a vaccination drive for the people born in the year 1990. An individual named Ashok is eligible, as he was born in 1990. But due to a clerical error, his Aadhar Card re�ects the year of birth as 1991. In the absence of this right, Ashok would have been denied vaccination because of a clerical error. However, Ashok can exercise his right to correction and request UIDAI to rectify the error.

87 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Possible implementation example for developers

Developers can provide a feature that allows the data subject to download his or her data in a standard machine-readable format.

Source: CNIL. Sheet n13, Prepare for the exercise of people’s rights

Right to Portability

�e right to data portability allows individuals the right to receive their personal data in a structured, commonly used and machine-readable format. It also gives them the right to request that a �duciary transmits this data directly to another �duciary. While it may seem similar to the right to access, the right to data portability applies to machine readable data as opposed to human readable data that you can request under right to access.

Data portability allows individuals to seamlessly switch between different service providers. It is similar to mobile number portability, which allows you to switch

between different telecom service providers without changing your number. For example, an individual uses Hotmail as his email service provider and has created a mailing list comprising his clients. However, he now wants to migrate to Gmail. Right to portability allows him to request Hotmail to either send this data (in a machine-readable format like CSV) either to him, or to Gmail directly.

As per clause 19, the personal data to be provided to the Data Principal would consist of:

I. Data already provided by the Data Principal to the Data Fiduciary;

II. Data which has been generated by the Data Fiduciary in its provision of services or use of goods;

III. Data which forms part of any pro�le on the Data Principal or which the Data Fiduciary has otherwise obtained.

Exemptions have been provided for instances where (i) the data processing is not automated; (ii) where processing is necessary for compliance of law, order of a court or for a function of the State; and signi�cantly, (iii) where compliance with the request would reveal a trade secret for a Data Fiduciary, or would not be technically feasible.

88 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Possible implementation example for developers(i) Provide a functionality allowing the data principal to restrict processing. (ii) If the request is approved, you must delete data already collected, and must not subsequently collect any more data

related to that person.

Source: CNIL. Sheet n13, Prepare for the exercise of people’s rights

Right to be Forgotten

Clause 20 of the PDP Bill pertains to the right to be forgotten. Under this Clause, the data principal is given the right to restrict or prevent the continuing disclosure of her/his personal data. It can be exercised if any one of the following conditions holds true:

I. �e disclosure of personal data is no longer necessary for the purpose for which it has been collected, or

II. �e data principal has revoked their consent to such disclosure of personal data, or

III. �e disclosure of personal data is contrary to the provisions of the Bill, or any other law in force.

Where disclosure has taken place on the basis of the consent of a data principal, the unilateral withdrawal of such consent could trigger the right to be forgotten. In other cases, where there is a con�ict of assessment as to whether the purpose of the disclosure has been served or whether it is no longer necessary, a balancing test must be carried out. �e exercise of this right requires balancing the right to privacy

with the right to freedom of speech and expression.

89 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

How can these rights be exercised?

Except the right to be forgotten, every right under the Act can be exercised by sending a written request to the data �duciary. Such a request can be made either directly or through a consent manager. Consent manager refers to any platform that allows individuals to manage (gain, withdraw, modify) their consent.

Upon receiving a request, the

data �duciary has to either comply with the request or explain as to why the request was not complied with.

�e right to be forgotten, on the other hand, can only be enforced upon an order of an adjudicating officer, appointed by the government. Data principals will be allowed to request the said officer to issue such an order. The officer will

determine whether or not to grant the request. However, such determination must be made considering factors like the sensitivity of the data, the scale of disclosure and the degree of accessibility sought to be restricted, the role of data principal in public life and the importance of the personal data to the public.

Checklist for Developers

Have you provided privacy information to individuals?

Do you have a process to recognise and respond to individuals' requests to access their personal data?

Do you have processes to ensure that the personal data you hold remains accurate and up to date?

Do you have processes to allow individuals to move, copy or transfer their personal data from one IT environment to another in a safe and secure way, without hindrance to usability?

Do you have a process to securely dispose of personal data that is no longer required or where an individual has asked you to erase it?

Do you have a process to respond to an individual’s request to restrict the processing of their personal data?

Please refer to the following �ow-chart to assess your compliance with the data protection law.�e accompanying master checklist will allow you to identify actionable steps and assess the degreeof your compliance.

No

No

No

No

No

Yes

Yes

Yes

Yes

Yes

Is the datapersonal in nature?

No furtheraction required

No furtheraction required

Are you a datafiduciary?

Are you a dataprocessor?

Ensure adequatesecurity safeguards

(Ex: encryption)

Refer to theChapter on Notice

and Consent

Refer to theChapter on Principlesof Data Protection

Have you obtained consentbefore data collection?

Is your processing compliant withthe principles of data protection?

90 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Compliance Map

Annexure A

91 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Technical Measures for Data Security

Generative Adversarial Networks (GAN) are used for generating synthetic data. As of today, GAN has mainly been used for the generation of images. But it also has the potential for becoming a method for generating huge volumes of high quality, synthetic training data in other areas. �is will satisfy the need for both labelled data and large volumes of data, without the need to utilise great amounts of data containing real personal information.

Methods for reducing the need for training data

�is is a form of distributed learning. Federated learning works by downloading the latest version of a centralized model to a client unit, for example a mobile phone. �e model is then improved locally on the client, on the basis of local data. �e changes to the model are then sent back to the server where they are consolidated with the change information from models on other clients. An average of the changed information is then used to improve the centralized model. �e new, improved centralized model may now be downloaded by all the clients. �is provides an opportunity to improve an existing model, on the basis of a large number of users, without having to share the users’ data.

MeaningMeasure

GenerativeAdversarialNetworks

Federated Learning

Matrix capsules are a new variant of neural networks,and require less data for learning than what is currently the norm for deep learning. �is is very advantageous because a lot less data is required for machine learning.

Matrix capsules

Start with a database that contains natural persons and features related to these persons. When information is retrieved from the database, the response will contain deliberatelygenerated “noise”, enabling information to be retrieved about persons in the database, but not precise details about speci�c individuals. A database must not be able to give a markedly di�erent result to a query if an individual person is removed from the database or not. �e overriding trends or characteristics of the dataset will not change.

Methods that uphold data protection without reducing the basic dataset

Di�erential privacy

Purpose

92 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

�is is an encryption method that enables the processing of data whilst it is still encrypted. �is means that con�dentiality can be maintained without limiting the usage possibilities of the dataset. At present, homomorphic encryption has limitations,which mean that systems employing it will operate at a much lower rate of e�ciency. �e technology is promising, however. Microsoft for example has published a white paper on a system that uses homomorphic encryption in connection with image recognition.

MeaningMeasure

Homomorphicencryption

It is not the case that it is always necessary to develop models from scratch. Another possibility is to utilise existing models that solve similar tasks. By basing processing on these existing models, it will often be possible to achieve the same result with less data and in a shorter time. �ere are libraries containing pre-trained models that can be used.

Transfer Learning

Statistics Norway (SSB) and the Norwegian Centre for Research Data (NSD) have developed a system called RAIRD. It permits research to be carried out on data without having direct access to the complete dataset.

In short, this system works by means of an interface that allows the researchers to access only the metadata of the underlying dataset. �e researcher can then submit queries based on the metadata and obtain a report containing aggregated data only.�is solution has been designed to prevent the retrieval of data relating to very small groups and individual persons. �is type of system can therefore be used when data for machine learning is needed. Instead of receiving a report as an end result, one could obtain a model from the system

RAIRD

Purpose

93 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Annexure B List of Abbreviations

ACM FAccTAssociation for Computing Machinery Fairness, Accountability and Transparency

AIArti�cial Intelligence

APIApplication Programming Interface

COMPAS Correctional O�ender Management Pro�ling for Alternative Sanctions

DPIA Data Protection Impact Assessment

FAT Framework Fairness Accountability and Transparency Framework

GAN Generative Adversarial Networks

GPTGenerative Pre-trained Transformer

IECInternational Electrotechnical Commission

IP Address Internet Protocol Address

IPRIntellectual Property Rights

ISInternational Standards

ISOInternational Organisation for Standardisation

ITInformation Technology

IT Act Information Technology Act, 2000

LIME Local Interpretable Model Agnostic Explanations

MAC Address Media Access Control Address

MLMachine Learning

MNE Guidelines OECD Guidelines for Multinational Enterprises

NSD Norwegian Centre for Research Data

NTANorwegian Tax Administration

OECD Organisation for Economic Co-operation and Development

PANPermanent Account Number

PA-PD Pseudonym Associated Personal Data

PDP Bill Personal Data Protection Bill, 2019

PETs Privacy Enhancing Technologies

PI-PD Personal Identi�able Personal Data

RAIRD Remote Access Infrastructure for Register Data

SEBI Securities and Exchange Board of India

SPDI Rules Information Technology (Reasonable Security Practices and Procedures and Sensitive Personal Data or Information) Rules, 2011

SPI-PD Sensitive Personal Identi�able Personal Data

SSBStatistics Norway

TAPAS Trustworthy Privacy-Aware Participatory Sensing

UKUnited Kingdom

USUnited States of America

USD United States Dollar

XAIExplainable AI

94 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Annexure CMaster Checklists (Section I: Ethics in AI)

Are you aware of the source of data used for training?

Have you recorded the attributes to be used for training and the weightage for each?

Is there scope for diagnosing errors at a later stage of processing?

Have you checked for any blind spots in the AI enabled system and decision-making process?

Can the AI decision-making process be explained in simple terms that a layperson can understand?

Is it possible to obtain an explanation/reasoning of AI decisions? For instance, can a banker obtain a record for acceptance/denial of a loan application.

Is there a mechanism for users and bene�ciaries to raise a ticket for AI decisions?

Is there scope for oversight and human review of AI decisions?

ChecklistStage of Intervention

Pre-Processing

In Processing

Post Processing

I. TRANSPARENCY

Does the start-up have policies or protocols or contracts on liability limitation and indemnity? Are these accessible and clear to you?

Does your organisation have a data protection policy? Are date protection guidelines being followed in the collection and storage of data, if the data is procured from a third-party?

Are there adequate mechanisms in place to �ag concerns with data protection practices?

Can the AI enabled system be compartmentalised based on who has developed the concerned portion?

Is it possible to apportion responsibility within your set-up if multiple developers have worked on a project?

Is it possible to maintain records about design processes and decision-making points?

Can decisions be scrutinised and traced back to attributes used and developers that worked on the project?

Have you identi�ed tasks that are too sensitive to be delegated to AI systems?

Are there protocols/procedures in place for monitoring, maintenance and review of AI enabled systems?

Pre-Processing

In Processing

Post Processing

II. Accountability

95 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Are you able to identify the source/sources of bias at the stage of date collection?

Did you check for diversity in data collection before it was used as training to mitigate bias?

Did you analyse the date for historical biases?

Have you assessed the possibility of AI correlating protected attributes and bias arising as a result?

Do you have an overall strategy (technical operational) to trace and address bias?

Do you have technical tools to identify potential sources of bias and introduce de-biasing techniques? Please see Appendix for a list of technical tools that developers may consider.

Have you identi�ed instances where human intervention would be preferable over automated decision making?

Have you identi�ed cases of where human intervention will be preferred over automated decision making?

Do you have internal and/or third-party audits to improve data collection processes?

ChecklistStage of Intervention

Pre-Processing

In Processing

Post Processing

III. Mitigating Bias

Have you anticipated the possibility of the algorithm treating similar candidates di�erently on the basis of attributes such as race, caste, gender or disability status?

Are there su�cient checks to ensure that the machine does not base its outputs on protected attributes?

Are there protocols or sensitisation initiatives to reconcile historical biases and build in weights to equalise potential biases?

Have you conducted due diligence to trace potential fairness harms that may arise directly or indirectly?

Do you have technical tools to diagnose fairness harms and address them?

Does the algorithm provide results that are equally and similarly accurate across demographic groups?

Have you checked outcomes to understand if the AI is producing equal true and false outcomes? Is there any scope for disproportionate outcomes?

Pre-Processing

In Processing

Post Processing

IV. Fairness

96 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Have you identi�ed scenarios where safety and reliability could be compromised, both for the users and beyond?

Have you classi�ed anticipated threats according to the level of risk and prepared contingency plans to mitigate this risk?

Have you de�ned what a safe and reliable network means (through standards and metrics), and in this de�nition consistent throughout the full lifecycle of the AI enabled system?

Have you created human oversight and control measures to preserve the safety and reliability risks of the AI system, considering the degree of self-learning and autonomous features of the AI system?

Do you have procedures in place to ensure the explainability of the decision-making process during operation?

Have you developed a process to continuously measure and assess safety and reliability risks in accordance with the risk metrics and risk levels de�ned in advance for each speci�c use case?

Have you assessed the AI enabled system to determine whether it is also safe for, and can be reliably used by, those with special needs or disabilities or those at risk of exclusion?

Have you facilitated testability and auditability?

Have you accommodated testing procedures to also cover scenarios that are unlikely to occur but are nonetheless possible?

Is there a mechanism in place for designers, developers, users, stakeholders and third parties to �ag/report vulnerabilities and other issues related to the safety and reliability of the AI enabled System? Is this system subject to third-party review?

Do you have a protocol to document results of risk assessment, risk management and risk control procedures?

ChecklistStage of Intervention

Pre-Processing

In Processing

Post Processing

V. Security

97 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Have you considered deploying privacy speci�c technological measures in the interest of personal data protection? Please see the Appendix for an illustrative list of technological measures you could consider.

Are you able to trace the source for the data being used in the AI system? Is there adequate documentation of informed consent to collect and use personal data for the purpose of developing AI?

Is sensitive data collected? If so, have you adopted higher standards for protection of this kind of data?

Does the training data include data of children or other vulnerable groups? Do you maintain a higher standard of protection in these cases?

Is the amount of personal data in the training data limited to what is relevant and necessary for the purpose?

Have you considered all options for the use of personal information (e.g. anonymisation or synthetic data) and chosen the least invasive method?

Have you considered mechanisms/techniques to prevent re-identi�cation from anonymised data?

Have you been informed of the digital requirements for processing personal information? What measures does you organisation adopt to ensure compliance?

Are there procedures for reviewing data retention and deleting data used by the AI System after it has served its purpose? Are there oversight/review mechanisms in place? Is there scope for external/third-party review?

Beyond the data principal’ privacy, have you evaluated the potential of data of an identi�ed group being at risk?

Is there a mechanism in place to manage consent provided by data principals? Is this mechanism accessible to principals? Is there a method to periodically review consent?

ChecklistStage of Intervention

Pre-Processing

In Processing

Post Processing

VI. Privacy

98 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Master Checklists (Section II: Data Protection)

Are you able to distinguish between the di�erent categories of data - personal data, sensitive personal data, non-personal data etc.?

Consider creating a referencer distinguishing data into categories for the kind of data you as a startup usually deal with.

Do you understand how to distinguish data that is personally identi�able and data that is not - based on content of the data, purpose of processing and e�ect of processing?

Do you use technical measures to secure personal data, sensitive personal data and other data that may be critical - such as anonymisation, pseudonymisation and encryption?

As a start-up, have you documented:

1. What personal data do you hold?

2. Where did it come from?

3. Whom do you share it with?

4. What do you do with it?

Have you identi�ed the lawful basis for processing data and documented this?

Does your startup have a protocol to review how you ask for data and record consent from data principals?

While notifying data principals, have you or your organisation ensured that the notice is clear, concise, easy to understand and provides the data principal with all relevant information as described under Clause 7 of the PDP Bill?

Do you provide developers with a clear understanding of the purpose for which data may be used?

Do you have a mechanism to ensure that this purpose is clearly represented to principals you collect data from?

If there is a change in how you use data, do you have a procedure to inform ‘data principals’ about this?

Do you have a quality check on datasets you collect, in order to remove excess data that you may not need?

Do you systematically remove data that you do not need anymore, even though you may have lawful consent to keep data for longer?

Do you train new recruits or conduct training exercises to inform employees of best practices or technical measures in data protection?

Do you have a mechanism to remind you that the data retention period has lapsed or is about to lapse?

Checklist

99 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

Checklist

Have you considered a mechanism to seek a renewal on data retention period in case you need data for longer than the data retention period?

Do you have a mechanism to remove data if the data principal has asked for the data to be removed/forgotten?

Have you considered a mechanism to �lter data at the stage of receipt of such data and remove data that is incomplete, inaccurate or unnecessary?

Do you take proactive steps to account for and address the margin of error or inaccuracy in the data you collect?

Do you have a protocol for quality control if you are not the entity collecting data?

Do you have a protocol for quality control over public data and/or government data?

Does you startup have an internal data protection policy?

Does your startup monitor its compliance with data protection policies and regularly review the e�ectiveness of data handling and security controls?

Does you startup have a written contract with any processors you use?

Has your startup implemented technical and organisational measures to integrate data protection practices at each stage of the processing activity?

Does your startup conduct Data Protection Impact Assessments? Is there a protocol for this?

Have you nominated an administrative o�cer to implement data protection principles or a Data Protection O�cer?

Have you provided privacy information to individuals?

Does your startup have a process to recognise and respond to individuals' requests to access their personal data?

Does your startup have processes to ensure that the personal data you hold remains accurate and up to date.

Does your startup have a protocol to securely dispose of personal data that is no longer required or where an individual has asked you to erase it?

Does your startup have procedures to respond to an individual’s request to restrict the processing of their personal data?

Does your startup have processes to allow individuals to move, copy or transfer their personal data from one IT environment to another in a safe and secure way, without hindrance to usability?

100 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

1. Solon Barocas and Andrew D. Selbst, ‘Big Data’s Disparate Impact’ (104 California Law Review 671, 2016) https://papers.ssrn.com/sol3/Delivery.cfm/SSRN_ID2808263_code1328346.pdf?abstractid=2477899.&mirid=1

2. Josh Cowls and Luciano Floridi, ‘Prolegomena To A White Paper On An Ethical Framework For A Good AI Society’ (2018)http://dx.doi.org/10.2139/ssrn.3198732

3. (Article19.org, 2021) https://www.article19.org/wp-content/uploads/2019/04/Governance-with-teeth_A19_April_2019.pdf

4. David Leslie, 'Understanding Arti�cial Intelligence Ethics and Safety: A Guide for the Responsible Design and Implementation of AI Systems in the Public Sector' (Zenodo, 2021) https://doi.org/10.5281/zenodo.3240529

5. Luciano Floridi and others, 'Ethical Framework For A Good AI Society' (Eismd.eu, 2019) https://www.eismd.eu/wp-content/uploads/2019/02/Ethical-Framework-for-a-Good-AI-Society.pdf

6. Pranay K. Lohia and others, 'Bias Mitigation Post-Processing For Individual And Group Fairness' (arxiv.org, 2021) https://arxiv.org/abs/1812.06135

7. Brian d’Alessandro, Cathy O’Neil and Tom LaGatta, 'Conscientious Classi�cation: A Data Scientist’s Guide To Discrimination-Aware Classi�cation' (Arxiv.org) https://arxiv.org/pdf/1907.09013.pdf

8. Ivan Bruha and A. Fazel Famili, 'Postprocessing in Machine Learning and Data Mining' (Kdd.org, 2000) https://www.kdd.org/exploration_�les/KDD2000PostWkshp.pdf

9. Dr. Matt Turek, 'Explainable Arti�cial Intelligence (XAI)' (Darpa.mil, 2019) https://www.darpa.mil/program/explainable-arti�cial-intelligence

10. Carlos Guestrin and Marco Tulio Ribeiro, 'Local Interpretable Model-Agnostic Explanations (LIME): An Introduction' (O’Reilly Media, 2021) https://www.oreilly.com/learning/introduction-to-local-interpretable-model-agnostic-explanations-lime

11. Felzmann, H., Fosch-Villaronga, E., Lutz, C. et al. Towards Transparency by Design for Arti�cial Intelligence. Sci Eng Ethics26, 3333–3361 (2020). https://doi.org/10.1007/s11948-020-00276-4

12. Alejandro Barredo Arrieta and others, 'Explainable Arti�cial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI' (Arxiv.org, 2019) https://arxiv.org/pdf/1910.10045.pdf

13. Heike Felzmann and others, 'Transparency You Can Trust: Transparency Requirements for Arti�cial Intelligence between Legal Norms and Contextual Concerns' (SAGE Journals, 2019) https://journals.sagepub.com/doi/full/10.1177/2053951719860542

14. 'Putting Data At �e Service Of Agriculture: A Case Study of CIAT | Digital Tools for Agriculture | U.S. Agency For International Development' (Usaid.gov, 2018) https://www.usaid.gov/digitalag/ciat-case-study

15. 'API Error Handling - Unique Identi�cation Authority of India | Government of India' (Unique Identi�cation Authority of India | Government of India, 2021) https://www.uidai.gov.in/916-developer-section/data-and-downloads-section/11351-api-error-handling.html

16. (Niti.gov.in, 2018) https://niti.gov.in/writereaddata/�les/document_publication/NationalStrategy-for-AI-Discussion-Paper.pdf

17. 'Transparency and Explainability (OECD AI Principle) - OECD.AI' (Oecd.ai, 2021) https://oecd.ai/dashboards/ai-principles/P7

18. Finale Doshi-Velez, Mason Kortz and Ryan Budish, 'Accountability of AI under the Law: �e Role Of Explanation' (Arxiv.org, 2021) https://arxiv.org/pdf/1711.01134.pdf

19. Genie Barton and Nicol Turner-Lee, 'Algorithmic Bias Detection And Mitigation: Best Practices And Policies To Reduce Consumer Harms' (Brookings, 2021) https://www.brookings.edu/research/algorithmic-bias-detection-and-mitigation-best-practices-and-policies-to-reduce-consumer-harms/#footnote-7

References

101 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

20. Je�rey Dastin, 'Amazon Scraps Secret AI Recruiting Tool �at Showed Bias against Women' (U.S., 2018) https://www.reuters.com/article/us-amazon-com-jobs-automation-insight/amazon-scraps-secret-ai-recruiting-tool-that-showed-bias-against-women-idUSKCN1MK08G

21. Point No. 12'REPORT On a Comprehensive European Industrial Policy on Arti�cial Intelligence and Robotics' (Europarl.europa.eu, 2018) https://www.europarl.europa.eu/doceo/document/A-8-2019-0019_EN.html#title4

22. 'Reducing Bias from Models Built on the Adult Dataset Using Adversarial Debiasing' (Medium, 2020) https://towardsdatascience.com/reducing-bias-from-models-built-on-the-adult-dataset-using-adversarial-debiasing-330f2ef3a3b4

23. '�e Allegheny Family Screening Tool' (Alleghenycounty.us, 2017) https://www.alleghenycounty.us/Human-Services/News-Events/Accomplishments/Allegheny-Family-Screening-Tool.aspx

24. Adam Hadhazy, 'Biased Bots: Arti�cial-Intelligence Systems Echo Human Prejudices' (Princeton University, 2017) https://www.princeton.edu/news/2017/04/18/biased-bots-arti�cial-intelligence-systems-echo-human-prejudices

25. (2021) https://www.ftc.gov/systems/�les/documents/public_events/313371/bigdata-slides-sweeneyzang-9_15_14.pdf

26. Arvind Narayanan, 'Tutorial: 21 Fairness De�nitions And �eir Politics' (2018) https://docs.google.com/document/d/1bnQKzFAzCTcBcNvW5tsPuSDje8WWWY-SSF4wQm6TLvQ/edit

27. Sarah Bird and others, 'Fairlearn: A Toolkit for Assessing and Improving Fairness in AI' (Microsoft.com, 2020) https://www.microsoft.com/en-us/research/uploads/prod/2020/05/Fairlearn_WhitePaper-2020-09-22.pdf

28. David De Cremer, 'What Does Building A Fair AI Really Entail?' (Harvard Business Review, 2020) https://hbr.org/2020/09/what-does-building-a-fair-ai-really-entail

29. Natasha Lomas, 'Accenture Wants to Beat Unfair AI with a Professional Toolkit' (Social.techcrunch.com, 2018) http://social.techcrunch.com/2018/06/09/accenture-wants-to-beat-unfair-ai-with-a-professional-toolkit/

30. (Query.prod.cms.rt.microsoft.com) https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4t6dA

31. HemaniSheth, '52% Of Indian Organisations Su�ered A Successful Cyber security Attack In �e Last 12 Months: Survey' (@businessline, 2021) https://www.thehindubusinessline.com/news/52-of-indian-organisations-su�ered-a-successful-cybersecurity-attack-in-the-last-12-months-survey/article34195953.ece

32. BigBasket Faces Data Breach: Details Of 2 Cr Users Put On Dark Web, Inc42, November 2020, https://inc42.com/buzz/big-breach-at-bigbasket-details-of-2-cr-users-put-on-dark-web-claims-cyble/

33. Nikhil Subramaniam, Flipkart Gets Caught In BigBasket Data Leak Aftermath As Users Allege Attacks, May 2021, https://inc42.com/buzz/�ipkart-gets-caught-in-bigbasket-data-leak-aftermath-as-users-allege-attacks/

34. 'Guidelines For MNEs - Organisation for Economic Co-Operation and Development' (Mne Guidelines.oecd.org) http://mneguidelines.oecd.org/

35. David Wright, Rowena Rodrigues and David Barnard-Wills, 'AI And Cybersecurity: How Can Ethics Guide Arti�cial Intelligence To Make Our Societies Safer? - Trilateral Research' (Trilateral Research) https://www.trilateralresearch.com/ai-and-cybersecurity-how-can-ethics-guide-arti�cial-intelligence-to-make-our-societies-safer/

36. 'ISO 27003: Guidance For ISO 27001 Implementation' (ISMS.online) https://www.isms.online/iso-27003/

37. 'Principle On Robustness, Security And Safety (OECD AI Principle) - OECD.AI' (Oecd.ai, 2021) https://oecd.ai/dashboards/ai-principles/P8

38. 'Openai Charter' (OpenAI, 2018) https://openai.com/charter/

39. Karen Hao, '�e Messy, Secretive Reality behind Openai’s Bid To Save �e World' (MIT Technology Review, 2020) https://www.technologyreview.com/s/615181/ai-openai-moonshot-elon-musk-sam-altman-greg-brockman-messy-secretive-reality/

102 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

40. 'Justice K. S. Puttaswamy v. Union Of India' (Scobserver.in, 2012) https://www.scobserver.in/court-case/fundamental-right-to-privacy

41. Sheenu Sura, 'Understanding �e Concept Of Right To Privacy In Indian Perspective - Ignited Minds Journals' (Ignited.in, 2019) http://ignited.in/I/a/89517

42. Asia J. Biega and others, 'Operationalizing the Legal Principle of Data Minimization for Personalization' (Arxiv.org, 2020) https://arxiv.org/pdf/2005.13718.pdf

43. 'Predictive Model for Tax Return Checks' (Skatteetaten.no, 2016) https://www.skatteetaten.no/globalassets/om-skatteetaten/analyse-og-rapporter/analysenytt/analysenytt2016-1.pdf#page=12

44. 'National Strategy for Arti�cial Intelligence' (Niti.gov.in, 2018) https://niti.gov.in/sites/default/�les/2019-01/NationalStrategy-for-AI-Discussion-Paper.pdf#page=85

45. 'Inclusive Growth, Sustainable Development And Well-Being (OECD AI Principle) - OECD.AI' (Oecd.ai) https://oecd.ai/dashboards/ai-principles/P5

46. 'Ethics Guidelines for Trustworthy AI | Shaping Europe’s Digital Future' (Ec.europa.eu) <https://ec.europa.eu/digital-single-market/en/news/ethics-guidelines-trustworthy-ai>

47. (Pdpc.gov.sg) https://www.pdpc.gov.sg/-/media/Files/PDPC/PDF-Files/Resource-for-Organisation/AI/SGModelAIGovFramework2.pdf

48. 'Ethics, Transparency And Accountability Framework For Automated Decision-Making' (gov.uk, 2021) https://www.gov.uk/government/publications/ethics-transparency-and-accountability-framework-for-automated-decision-making/ethics-transparency-and-accountability-framework-for-automated-decision-making

49. 'Data Ethics Framework' (gov.uk, 2020) https://www.gov.uk/government/publications/data-ethics-framework/data-ethics-framework-2020

50. 'Explainable AI: �e Basics' (Royalsociety.org, 2019) https://royalsociety.org/-/media/policy/projects/explainable-ai/AI-and-interpretability-policy-brie�ng.pdf

51. 'AI Principles - Future Of Life Institute' (Future of Life Institute, 2017) https://futureo�ife.org/ai-principles/

52. 'Montreal Declaration For A Responsible Development Of Arti�cial Intelligence - La Recherche - Université De Montréal' (Recherche.umontreal.ca, 2017) https://recherche.umontreal.ca/english/strategic-initiatives/montreal-declaration-for-a-responsible-ai/

53. 'IEEE SA - �e IEEE Global Initiative on Ethics Of Autonomous And Intelligent Systems' (Standards.ieee.org, 2021) https://standards.ieee.org/industry-connections/ec/autonomous-systems.html

54. 'Tenets - �e Partnership On AI' (�e Partnership on AI, 2021) https://www.partnershiponai.org/tenets/

55. 'AI in the UK: Ready, Willing and Able?' (Publications.parliament.uk, 2019) https://publications.parliament.uk/pa/ld201719/ldselect/ldai/100/100.pdf#page=417

56. 'Research And Innovation' (European Commission - European Commission, 2018) https://ec.europa.eu/research/ege/pdf/ege_ai_statement_2018.pdf

57. '�e Toronto Declaration: Protecting �e Right to Equality and Non-Discrimination in Machine Learning Systems' (Accessnow.org, 2018) https://www.accessnow.org/cms/assets/uploads/2018/08/�e-Toronto-Declaration_ENG_08-2018.pdf

58. Nicholas Diakopoulos and others, 'Principles For Accountable Algorithms And A Social Impact Statement For Algorithms:: FAT ML' (Fatml.org) https://www.fatml.org/resources/principles-for-accountable-algorithms

59. 'Text - H.R.2231 - 116� Congress (2019-2020): Algorithmic Accountability Act Of 2019' (Congress.gov, 2019) https://www.congress.gov/bill/116th-congress/house-bill/2231/text

60. Dillon Reisman and others, 'Algorithmic Impact Assessments: A Practical Framework For Public Agency Accountability' (Aino Institute.org, 2018) https://ainowinstitute.org/aiareport2018.pdf

103 / Handbook on Data Protection and Privacy for Developers of Artificial Intelligence (AI) in India: Practical Guidelines for Responsible Development of AI

61. '�e Struggles Businesses Face in Accessing the Information �ey Need' (Info.pure storage.com) http://info.purestorage.com/rs/225-USM-292/images/Big%20Data%27s%20Big%20Failure_UK%281%29.pdf?aliId=64921319

62. 'Feature Selection - Wikipedia' (En.wikipedia.org) https://en.wikipedia.org/wiki/Feature_selection

63. Saikat Guha, Bin Cheng and Paul Francis, 'Privad: Practical Privacy In Online Advertising' (Usenix.org) https://www.usenix.org/legacy/event/nsdi11/tech/full_papers/Guha.pdf

64. Leyla Kazemi and Cyrus Shahabi, 'TAPAS: Trustworthy Privacy-Aware Participatory Sensing' (Infolab.usc.edu, 2012) https://infolab.usc.edu/DocsDemos/kazemi-TAPAS-KAIS.pdf

65. Rachel Levy Sar�n, '5 Characteristics of Data Quality - See Why Each Matters to Your Business' (Precisely, 2021) https://www.precisely.com/blog/data-quality/5-characteristics-of-data-quality#:~:text=�ere%20are%20data%20quality%20characteristics,read%20on%20to%20learn%20more.

66. 'REPORT On a Comprehensive European Industrial Policy on Arti�cial Intelligence and Robotics' (Europarl.europa.eu, 2019) https://www.europarl.europa.eu/doceo/document/A-8-2019-0019_EN.html#title4

67. �omas C. Redman, 'Bad Data Costs �e U.S. $3 Trillion Per Year' (Harvard Business Review, 2016) https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year

68. Angus Loten, 'AI E�orts At Large Companies May Be Hindered by Poor Quality Data' (WSJ, 2019) https://www.wsj.com/articles/ai-e�orts-at-large-companies-may-be-hindered-by-poor-quality-data-11551741634

69. Sylvia Kingsmill and Ann Cavoukian, 'Privacy by Design' (www2.deloitte.com) https://www2.deloitte.com/content/dam/Deloitte/ca/Documents/risk/ca-en-ers-privacy-by-design-brochure.PDF

70. Ann Cavoukian, 'Privacy by Design: �e 7 Founding Principles' (Ipc.on.ca, 2011) https://www.ipc.on.ca/wp-content/uploads/resources/7foundationalprinciples.pdf

71. 'Software Development with Data Protection by Design and by Default' (Datatilsynet, 2017) https://www.datatilsynet.no/en/about-privacy/virksomhetenes-plikter/innebygd-personvern/data-protection-by-design-and-by-default/

72. Ann Cavoukian, 'Operationalizing Privacy by Design: A Guide To Implementing Strong Privacy Practices' (Collections.ola.org, 2012) https://collections.ola.org/mon/26012/320221.pdf

73. Ian J. Goodfellow and others, 'Generative Adversarial Nets' (Papers.nips.cc) https://papers.nips.cc/paper/5423-generative-adversarial-nets.pdf

74. 'Federated Learning: Collaborative Machine Learning without Centralized Training Data' (Google AI Blog, 2017) https://research.googleblog.com/2017/04/federated-learning-collaborative.html

75. Geo�rey Hinton, Sara Hinton and Nicholas Frosst, 'Matrix Capsules With EM Routing' (Openreview.net, 2018) https://openreview.net/pdf?id=HJWLfGWRb

76. Zhanglong Ji, Zachary C. Lipton and Charles Elkan, 'Di�erential Privacy And Machine Learning: A Survey And Review' (arxiv.org, 2014) https://arxiv.org/abs/1412.7584

77. 'Homomorphic Encryption Standardization – An Open Industry / Government / Academic Consortium to Advance Secure Computation' (Homomorphic Encryption.org) http://homomorphicencryption.org/

78. 'Machine Learning Research Group | University Of Texas' (cs.utexas.edu) http://www.cs.utexas.edu/~ml/publications/area/125/transfer_learning

79. 'RAIRD' (Raird.no, 2019) http://raird.no/

Deutsche Gesellschaft fürInternationale Zusammenarbeit(GIZ) GmbH

A2/18 Safdarjung EnclaveNew Delhi-110029 India

T: +91-11-49495353E: [email protected]/India