There’s been plenty of media attention lately about the news that the US government has reversed course on internet privacy and now allows internet service providers to use and sell your browsing history for commercial purposes. This has sparked broader interest in using virtual private networks (VPNs) as a way for internet users to protect their privacy.
Continue reading
If you are one of the many people who, like me, are coming to Python for data analysis after having spent a lot of time working with Microsoft Excel, you will at some point find yourself saying, “How do I do a vLookup in Python?” (Or, if you’re really like me, you’ll throw in a few expletives.)
Continue reading
The pandas library for Python is extremely useful for formatting data, conducting exploratory data analysis, and preparing data for use in modeling and machine learning.
Continue reading
It’s probably safe to say that anyone who has ever used git has struggled with it in one way or another. My most vexing problem has been the (apparently common) issue where git cannot create the index.lock
file. In my case, this has happened while trying to update my fork by pulling from the upstream master repo.
Continue reading
Sometimes when building a model, it’s wise to stratify the y
(target) variable when you split your training and testing data from the total sample (train/test/split). Why would we do this? If you have y
data that is not normally distributed, you may have a situation where your random samples of your sample might not be sufficiently representative of the sample. Meta!
Continue reading