Big Data has become synonymous with Data engineering. But the line between Data Engineering and Data scientists is blurring day by day. At this point in time, I think that Big Data must be in the repertoire of all data scientists. Reason: Too much data is getting generated day by day And that brings us to Spark which is one of the most used tools when it comes to working with Big Data.
Every few years, some academic and professional field gets a lot of cachet in the popular imagination. Right now, that field is data science. As a result, a lot of people are looking to get into it. Add to that the news outlets calling data science sexy and various academic institutes promising to make a data scientist out of you in just a few months, and you’ve got the perfect recipe for disaster.
Recently, I was reading Rolf Dobell’s The Art of Thinking Clearly, which made me think about cognitive biases in a way I never had before. I realized how deeply seated some cognitive biases are. In fact, we often don’t even consciously realize when our thinking is being affected by one. For data scientists, these biases can really change the way we work with data and make our day-to-day decisions, and generally not for the better.
I have found myself creating a Deep Learning Machine time and time again whenever I start a new project. You start with installing Anaconda and end up creating different environments for Pytorch and Tensorflow, so they don’t interfere. And in the middle of it, you inevitably end up messing up and starting from scratch. And this often happens multiple times. It is not just a massive waste of time; it is also mighty(trying to avoid profanity here) irritating.
Python provides us with many styles of coding. And with time, Python has regularly come up with new coding standards and tools that adhere even more to the coding standards in the Zen of Python. Beautiful is better than ugly. In this series of posts named Python Shorts, I will explain some simple but very useful constructs provided by Python, some essential tips, and some use cases I come up with regularly in my Data Science work.
Have you ever thought about how toxic comments get flagged automatically on platforms like Quora or Reddit? Or how mail gets marked as spam? Or what decides which online ads are shown to you? All of the above are examples of how text classification is used in different areas. Text classification is a common task in natural language processing (NLP) which transforms a sequence of a text of indefinite length into a single category.
It seems that the way that I consume information has changed a lot. I have become quite a news junkie recently. One thing, in particular, is that I have been reading quite a lot of international news to determine the stages of Covid-19 in my country. To do this, I generally visit a lot of news media sites in various countries to read up on the news. This gave me an idea.