This report utilizes the cancer rate dataset that ecompasses the entire United States. The goal is to better understand trends within the U.S. demographics and cancer rates.
The first thesis that this report attempts to support is that income rates are negatively correlated with lung cancer rates in the U.S. This means that as income rates increase, lung cancer rates should decrease. This holds at the region level and we will see that the asian_alone demographic acts differently than the other demographics within this relationship. The graph below supports the first notion that income rates have a negative relationship with lung cancer rates in the U.S.:
Above we can see the downward trend through the blue “best-fit” line. As income increases, lung cancer rates for the whole data set generally decrease. With the added best-fit line we can more clearly see how the relationship between the two variables is negative and that they are inversely related. In order to further strengthen the argument the graph below shows the relationship between lung cancer rates and income levels in each region within the U.S.
From the graphs above we can clearly see that the inverse relationship is maintained even after being broken down into the 9 regions.
Now we will look more closely at the different lung cancer rates between the different demographics by running the regression table below:
##
## Call:
## lm_basic(formula = lung ~ 1 + asian_alone + black_alone + white_alone,
## data = continental_us)
##
## Residuals:
## Min 1Q Median 3Q Max
## -52.366 -9.702 -0.457 8.607 128.735
##
## Coefficients:
## Estimate 2.5 % 97.5 %
## (Intercept) 10.4756 -1.4490 22.400
## asian_alone -0.8536 -1.1831 -0.524
## black_alone 0.7805 0.6484 0.913
## white_alone 0.6410 0.5160 0.766
##
## Residual standard error: 16.01 on 1948 degrees of freedom
## Multiple R-squared: 0.136, Adjusted R-squared: 0.1346
## F-statistic: 102.2 on 3 and 1948 DF, p-value: < 2.2e-16
When analyzing the regression table above we see that when looking at lung cancer rates in the U.S. the asian-alone demographic has a lower average rate of lung cancer than the baseline. We also see the the black-alone and white-alone demographics are statistically significant and positive meaning that the lung cancer rates are higher than the baseline. We can now look even deeper into how the asian demograpics factor in with the income levels:
Now if we focus on one of the demographics we can see where the demographic is clustered in terms of lung cancer rate and income level. We see here that the asian demographic, which was statistically lower than the baseline in the previous regression model, has clusters mainly in higher income levels (between $50,000- $100,000) with a higher population densities more towards $100,000. Furthermore these high levels of income are associated with low cancer rates per 100,000 men over the age of 18. Further research is needed to determine whether a high income level is the cause of lower lung cancer rates or that asian have a lower lung cancer rates than the other demgraphics. Either way we must make the distinction that being high income level is not a result of being asian.
After looking at both the inverse relationship between lung cancer rates and income levels and how demographics play into the dataset, it is important to address an anomaly found when graphing the lung cancer rates across America.
From the map above we see there is an extremely high rate of lung cancer per 100,000 men in one county within the state of Florida. Union County, Florida is the county with the highest rate of lung cancer in America. Further research into the county reveals that Union County is home to the only cancer treatment center for inmates in the state of Florida. From the article, “Study: Union County has nation’s highest death rate from lung cancer” by Deanna Bettineschi, we see that the researchers covering this topic believe that the rate is so high because inmates from around the state are being sent to this county’s prison for treatment and their lung cancer diagnosis is recorded here. The article admits that there is no documentation to prove the county is sending in inmates to this prison however all current knowledge and research would suggest so.
Overall we not that lung cancer rates and income levels are negatively correlated. Some reasons for this may be that those in lower income levels may tend to soke cigarettes compared to those in higher income levels. To support this we would need to look at a dataset on cigarette smokers with similar variables.
The second hypothesis that this report looks to explore is that breast cancer rates and income levels are actually postiviely correlated unlike lung cancer rates and income levels.
Again when we look at teach region separately we see that the trend is postive throughout all 9 regions.
##
## Call:
## lm_basic(formula = breast ~ 1 + asian_alone + black_alone + white_alone,
## data = continental_us)
##
## Residuals:
## Min 1Q Median 3Q Max
## -58.090 -9.824 0.266 10.296 137.611
##
## Coefficients:
## Estimate 2.5 % 97.5 %
## (Intercept) 75.4795 63.1234 87.836
## asian_alone 2.0126 1.6712 2.354
## black_alone 0.5448 0.4078 0.682
## white_alone 0.4078 0.2782 0.537
##
## Residual standard error: 16.59 on 1948 degrees of freedom
## Multiple R-squared: 0.07946, Adjusted R-squared: 0.07804
## F-statistic: 56.05 on 3 and 1948 DF, p-value: < 2.2e-16
When analyzing how demographics vary with breast cancer rates we observe that asian, blacks, and whites all are positive statistically significant when compared with the baseline of the entire dataset. We see that this varies from the same regression ran with lung cancer rates in that asians are now not statistically lower than the other two groups.
Now we will look at breast cancer rates plotted on a U.S. continental map just like the lung cancer rates in the previous section.
In the map above we again observe the Union County anomoly. More importantly however, we see the lighter color of data points on the map suggesting higher rates of breast cancer than lung cancer on a whole if we compare this plot to the plotted map in the Lung Cancer Thesis Section.
The analysis supports that breast cancer rates and income have a positive relationship but now we can theorize why this may be the case. It may be the case that with higher levels of income those individuals may have more money and more time to spend on checkups and treatments suggesting that those in lower income levels may succumb to the disease without being diagnosised. This assumption needs to be further researched and a look into a larger dataset collected by response surveys of patients may provide answers.