

{"id":256539,"date":"2022-10-21T07:34:18","date_gmt":"2022-10-21T02:04:18","guid":{"rendered":"https:\/\/www.jigsawacademy.com\/?p=256539"},"modified":"2022-10-21T07:35:13","modified_gmt":"2022-10-21T02:05:13","slug":"biases-in-data-collection-types-and-how-to-avoid-the-same","status":"publish","type":"post","link":"https:\/\/www.jigsawacademy.com\/blogs\/business-analytics\/biases-in-data-collection-types-and-how-to-avoid-the-same\/","title":{"rendered":"Biases in Data Collection: Types and How to Avoid the Same"},"content":{"rendered":"<h3 aria-level=\"1\"><b><span data-contrast=\"auto\">Introduction<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:400,&quot;335559739&quot;:120,&quot;335559740&quot;:276}\">\u00a0<\/span><\/h3>\n<p><span data-contrast=\"auto\">An inaccuracy known as bias in data occurs when specific dataset components are overweighted or overrepresented. The key to overcoming bias is being aware of its warning signs. We have various implicit and explicit biases as humans. Biases are deliberate errors in reasoning that are impacted by one&#8217;s culture and experiences. Biases affect our views and lead us to make bad choices. The bias against automation is one that many people have. <\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">In reality, computers, data, and algorithms are not entirely objective. Data analysis can indeed aid in better decision-making, yet bias can still creep in. It&#8217;s we, humans, that technologies and algorithms. As a result, human biases are frequently incorporated into them. It is obvious that when using a GPS while driving, we must pay attention to other information streams (such as our eyes and ears). Similarly, we must consider other data sources while assessing the findings or recommendations of data analysis.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Understanding the various biases that emerge at each data analysis stage is necessary if we wish to use data and algorithms responsibly. In more detail, let&#8217;s examine some biases affecting data analysis and data-driven decision-making.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<h2 aria-level=\"2\"><b><span data-contrast=\"auto\">What Does Bias Mean in Data Analytics?<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:360,&quot;335559739&quot;:120,&quot;335559740&quot;:276}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">We must first gather data before we can evaluate it or apply Machine Learning techniques. Data bias might occur when you train an algorithm for Machine Learning with a dataset that isn&#8217;t accurately reflective of its intended usage. For instance, if you&#8217;re marketing premium spirits and only train your AI using data that mimics beer drinkers&#8217; behavior, the results will be highly wrong and distorted. The data utilized to train the ML models is a major contributor to these biases. It is known that practically all massive data sets produced by ML\/AI-powered systems are <\/span><b><span data-contrast=\"auto\">biased<\/span><\/b><span data-contrast=\"auto\">. However, even if they are aware of these biases, most ML modelers are unsure how to address them.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Sampling and estimation are both prone to bias. Our data wouldn&#8217;t be <\/span><b><span data-contrast=\"auto\">biased <\/span><\/b><span data-contrast=\"auto\">if we had complete knowledge of every entity in it (such as customers, insurance claims, and software sessions) and if we could save data on every imaginable entity. Humans also make poor, intuitive statisticians, and their estimates are frequently off. These issues are so pervasive that they are frequently discovered in meticulously planned and controlled statistical tests.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<h2 aria-level=\"2\"><b><span data-contrast=\"auto\">How Does Bias Show in Analytics?<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:360,&quot;335559739&quot;:120,&quot;335559740&quot;:276}\">\u00a0<\/span><\/h2>\n<p><span data-contrast=\"auto\">The source material is not the only way bias can enter data. It can also be introduced via data collection and analysis techniques. There are a variety of biases that might harm the data, including the following:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p aria-level=\"3\"><b><span data-contrast=\"auto\">Propagating the Current State\u00a0<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:320,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">In data analysis, propagating a current state is a typical form of bias. Men were preferred in Amazon&#8217;s recruiting processes since they were more reflective of their current workforce. The algorithms were prejudiced by other factors they considered that were indirectly related to gender, such as sports, social activities, and adjectives used to characterize accomplishments, even though they didn&#8217;t expressly know or consider the applicants&#8217; gender.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">In essence, the AI detected these minute variations and searched for candidates that met their internal definition of successful candidates. Providing context and relationships to your AI systems is a good countermeasure.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p aria-level=\"3\"><b><span data-contrast=\"auto\">Trained on the Wrong Thing<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:320,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">When an algorithm makes intentional, recurring mistakes that result in unfair outcomes, such as favoring one group over another, this is known as algorithmic bias. Selection bias can be the starting point for algorithmic bias, which is subsequently strengthened and maintained by other bias types.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">The algorithms used in facial recognition software are proprietary. Therefore, we cannot determine the algorithm&#8217;s design or functioning, in addition to not knowing what data were utilized for training and testing the algorithm. <\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Transparency is essential for avoiding algorithmic bias, particularly when it comes to the data used to train and test an algorithm. In response to darker females&#8217; low facial recognition performance, a new benchmarking dataset was created that is more inclusive of the entire human population. This is significant, provided businesses that develop and market facial recognition software use the new dataset.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p aria-level=\"3\"><b><span data-contrast=\"auto\">Under-representing Populations<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:320,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Selection bias occurs when the data you will examine is not chosen at random. When selecting or omitting data for a specific reason, such as when dealing with outliers, it might result in an inaccurate depiction of the outcomes your company is attaining. Under-representing population samples that don&#8217;t fairly represent the full target group results in selection bias. A major community, for instance, would not be accurately represented by data gathered from a single area.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Selection bias can occur for a variety of reasons, some of which are purposeful and others unintentional, such as voluntary participation, participation restrictions, or insufficient sample size. There are three types of selection bias:<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<ul>\n<li data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"1\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"1\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Sampling bias: <\/span><\/b><span data-contrast=\"auto\">When randomization is improperly achieved during data collection, this phenomenon takes place.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/li>\n<li data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"1\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"2\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Convergence bias: <\/span><\/b><span data-contrast=\"auto\">This happens<\/span><span data-contrast=\"auto\"> when data are not chosen in a representative way. For instance, your dataset does not accurately reflect the group of people who did not buy your goods if you just survey half of your consumers who bought your product.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/li>\n<li data-leveltext=\"\uf0b7\" data-font=\"Symbol\" data-listid=\"1\" data-list-defn-props=\"{&quot;335552541&quot;:1,&quot;335559684&quot;:-2,&quot;335559685&quot;:720,&quot;335559991&quot;:360,&quot;469769226&quot;:&quot;Symbol&quot;,&quot;469769242&quot;:[8226],&quot;469777803&quot;:&quot;left&quot;,&quot;469777804&quot;:&quot;\uf0b7&quot;,&quot;469777815&quot;:&quot;hybridMultilevel&quot;}\" aria-setsize=\"-1\" data-aria-posinset=\"3\" data-aria-level=\"1\"><b><span data-contrast=\"auto\">Participation bias: <\/span><\/b><span data-contrast=\"auto\">It arises when participation gaps in the data collection process cause the data to be unrepresentative.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/li>\n<\/ul>\n<p aria-level=\"3\"><b><span data-contrast=\"auto\">Faulty Interpretation<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:320,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Outliers can severely distort data. For instance, a few exceptionally affluent people whose income can skew any average figure can be found when examining income in the United States. Because of this, a median figure frequently represents the wider population more accurately.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p aria-level=\"3\"><b><span data-contrast=\"auto\">Cognitive Biases<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:320,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Cognitive biases are deliberate errors in reasoning that frequently result from cultural and personal experiences and alter perceptions while making judgments. Even though data may appear to be objective, humans also gather and process it, so it can still be <\/span><b><span data-contrast=\"auto\">biased<\/span><\/b><span data-contrast=\"auto\">.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Statistical bias, such as sampling or selection bias, is a result of cognitive bias. Instead of using properly designed data sets, analysis is frequently done on data that is already available or found in data that has been pieced together.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Sample bias results from the initial data collection and the analyst&#8217;s decision of which data to include or exclude. Selection bias arises when the experimental data acquired does not reflect the true future population of instances the model will see.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">It is beneficial to switch from static data sources to event-based data, which enables data to update over time to more properly reflect the world we live in. Examples of this include dynamic interfaces and Machine Learning algorithms that can be tracked and evaluated over time.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p aria-level=\"3\"><b><span data-contrast=\"auto\">Analytics Bias<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:320,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Analytics bias is a deliberate inclination that distorts results from reality. Numerous aspects of the data analysis process, such as the data source, the estimator selected, and the methods used to evaluate the data, are <\/span><b><span data-contrast=\"auto\">biased<\/span><\/b><span data-contrast=\"auto\">. For instance, this bias may significantly impact the findings when looking at people&#8217;s purchasing patterns. The results might not indicate the entire population&#8217;s purchasing patterns if the sample size is insufficient. To put it another way, there might be differences between poll results and real results. Therefore, determining the cause of analytical bias can assist in determining if the observed results are accurate.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p aria-level=\"3\"><b><span data-contrast=\"auto\">Confirmation Bias<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:320,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Allowing a previous thought to influence how you comprehend and then prioritize information is known as confirmation bias. Imagine that you firmly believe that the majority of people favored pineapple pizza over margarita pizza and, as a result, gave more importance to data that could prove that belief. That is called confirmation bias.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">This underlying bias is probably something you experience every single day of your life. Everybody wants to be right, so our minds are always looking for data to back up our preconceived notions. Even when we make an effort to be receptive to various viewpoints, our minds tend to turn back to the security and familiarity of our initial views. This can occur unconsciously through biases in finding, understanding, or recalling information.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">When an issue is highly essential or self-relevant, people are more inclined to analyze evidence to support their own opinions. Confirmation bias is significant because it can lead people to believe what aligns with their belief systems, even if factually incorrect. People may have too much confidence in their beliefs because they have gathered evidence to back them up, but there is often a lot of evidence that contradicts those beliefs that have been overlooked or disregarded. If that evidence were taken into account, one might be less confident in their beliefs. These elements may influence people to make unsafe choices and cause them to ignore cautionary indications and other significant information.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p aria-level=\"3\"><b><span data-contrast=\"auto\">Outlier Bias\u00a0<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:true,&quot;201341983&quot;:0,&quot;335559738&quot;:320,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<p><span data-contrast=\"auto\">Averages are an excellent way to cover up unpalatable truths. Although it is easier to represent some data as an average, this basic technique obscures the impact of outliers and anomalies and distorts our observations. An outlier is a data number that is extremely high or low. For instance, a 110-year-old consumer. Or a customer who has \u20ac10 million in savings. You can identify outliers by carefully examining the data, especially the value distribution. These are values that are significantly higher or lower than the range of the majority of the other values. It can be risky to base a choice on the &#8220;average&#8221; due to outliers. If someone gives you average values, you should confirm that outliers have been removed.<\/span><span data-ccp-props=\"{&quot;201341983&quot;:0,&quot;335559739&quot;:160,&quot;335559740&quot;:276}\">\u00a0<\/span><\/p>\n<h3 aria-level=\"3\"><b><span data-contrast=\"none\">Conclusion:<\/span><\/b><span data-ccp-props=\"{&quot;134245418&quot;:false,&quot;201341983&quot;:0,&quot;335559738&quot;:280,&quot;335559739&quot;:160,&quot;335559740&quot;:288}\">\u00a0<\/span><\/h3>\n<p aria-level=\"3\"><span data-contrast=\"none\">Due to the prevalence of data-driven technologies today, biased data can result in a variety of negative outcomes, including complicated social repercussions. Prejudices may be subtly reinforced if they are continually fed into our cultural consciousness through the use of data-driven technology, creating a cycle that we can only stop with deliberate effort. Humans have the capacity for cultural evolution, at least within groups, giving them an edge over machine learning in terms of providing some sort of checks and balances against prejudice. If you&#8217;re interested in knowing more about <\/span><span data-contrast=\"auto\">d<\/span><a href=\"https:\/\/www.jigsawacademy.com\/blogs\/ipba\/what-is-data-collection-method-types-of-tools-and-techniques\/\"><span data-contrast=\"auto\">ata collection methods<\/span><\/a><span data-contrast=\"none\">, UNext<\/span><span data-contrast=\"auto\"> Jigsaw&#8217;s<\/span><a href=\"https:\/\/www.jigsawacademy.com\/integrated-program-in-business-analytics\/\"><span data-contrast=\"none\"> Integrated Program In Business Analytics<\/span><\/a><span data-contrast=\"none\"> is highly r<\/span><b><span data-contrast=\"none\">ecommended<\/span><\/b><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction\u00a0 An inaccuracy known as bias in data occurs when specific dataset components are overweighted or overrepresented. The key to overcoming bias is being aware of its warning signs. We have various implicit and explicit biases as humans. Biases are deliberate errors in reasoning that are impacted by one&#8217;s culture and experiences. Biases affect our [&hellip;]<\/p>\n","protected":false},"author":2640,"featured_media":256548,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[1496,1495],"tags":[],"form":[10307],"acf":[],"_links":{"self":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts\/256539"}],"collection":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/users\/2640"}],"replies":[{"embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/comments?post=256539"}],"version-history":[{"count":1,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts\/256539\/revisions"}],"predecessor-version":[{"id":256549,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/posts\/256539\/revisions\/256549"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/media\/256548"}],"wp:attachment":[{"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/media?parent=256539"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/categories?post=256539"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/tags?post=256539"},{"taxonomy":"form","embeddable":true,"href":"https:\/\/www.jigsawacademy.com\/wp-json\/wp\/v2\/form?post=256539"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}