
Introduction and Considerations
I believe that before you dive deep into Data Science concepts, it is good to work on your foundation. If you come form a math or programming related field, this should be treated as a review. Otherwise, it is good to study these topics cause they will make a difference when you search for a job or develop your own projects.
Think of it as building a race car. You can buy the engine somewhere else, but if you want to race, you need to make some changes to the engine and know how it works. Otherwise you will just performe like everyone else. The same applies to machine learning. By learning the maths concepts where machine learning is built uppon, you will know how to choose and adjust the models to achieve a better performance.
If you decide to “look under the hood”, also wrote a post about how to study and build the foundation on math and programming skills. Check it out:
1. Analytical Skills
After you built the foundation on math and programming, you can start the next step, which is to make a proper data analysis. You will find out that before having the real fun, there are a lot of steps into building a model and deploying it, so it is very important to understand the data you are working with.
I started my career as a Business Intelligence (BI) analyst, and worked my way up to manager in the same field of expertise. As a BI expert the skills you build along the way are:
- Data Analysis
- ETL (Extract, Transform & Load) – data collection, data cleaning and data pre-processing
- Data Modelling and Database Design (datawarehouse)
- SQL programming
- Report Development
- Business Process
All of these skills were (and will be important) as data scientists, but one that set me appart was how easily I could understand the Business Process behind the development. This is know in data science as Domain Knowledge. Some of this will come to you as experience in working the field of your choice, but always keep that in mind.
So, to update and refresh my knowledge in this area and update in new technologies, I got the following certification:
Coursera: Google Data Analytics Professional Certification
It´s a very basic course on data vizualization, data cleaning, analysis and making data-driven decisions. I got a chance to work with R and Tableu, also refresh on SQL as well. By the end of this course you will be able to understand and make sense of data. This will help you down the line, when you are working on the Exploration Data Analysis step of the model creating.
The tools where different, but the end goal is the same: to tell a story with the data your given, and you can´t explain something you don´t know. You can only work with data you understand.
2 – Data Science Basics
If you get to this point, you must be anxious to learn and starting messing around with models… well here you can choose to get another certificate, the “normal way” or to “get your hands dirty”. I choose the second option, which is learning as working on a project, which is my favorite type of learning approach.
Move to the next section if you choose to get another certificate.
2.1- Project-Based Learning
2.1.1 – The approach
Project-based learning is an innovative educational approach that fosters deep understanding and practical skills by immersing learners in real-world, hands-on projects. To apply it effectively, I begin by selecting a compelling project that aligns with curriculum goals and interests. In my case I decided to go with the Spaceship Titanic Competition. Having available references and resources for research is an important factor, so I could learn from what others have done. Since this is an open competition, there is a lot of information avaliable.
Throughout the process, guidance and support was provided by the internet, see below the sources used during project development. As I work on the project, and the challenges apeared and the learning experience came from both successes and failures working on the project. After each step there is a period of reflection and self-assessment to promote metacognition and ways to improve.
2.1.2 – The project
The final product generated by this work is a Jupyter Notebook that to demonstrate the newfound knowledge and skills acquired.
The success of project-based learning lies in its ability to cultivate a deep, lasting understanding of subject matter, critical thinking abilities, and a sense of accomplishment.

Sources of information
Kaggle Introdutory Courses & Certificates: these were I good starting place, since my focus was to work on the Spaceship Titanic dataset. All the courses are very basic and easy to follow.
Krish Naik Youtube Playlists – good references for how to structure a data science project and detailed “how-to” on each step:
Books – the books used can be found here.
ChatGTP – I prompted chatGTP to assume the role of a data science teacher and used it to help me with questions, code and references. I found this to be a good way to have a “personal teacher” helping me with my questions and learning. I have been using this approach still.
2.1.3 – The learning
These are the skills that were developed during this process:
- Project Initialization – how to setup goals e the project startup
- Data Gathering / Domain Knowledge
- Data Cleaning
- Data Analysis
- Data Vizualization
- Pre-processing data
- Feature Engineering and Selection
- Python programming
- Plotting libraries
- Numpy and Pandas
- Scikit Learning Models
- Model selection and optimization
- Pipelines
The initial problem is to identify what are the process that should be followed to develop a model and each step was had it´s own challanges. By using the sources listed above, I was able to learn how to manage and deliver an entire data science project using all these education tools at my disposal – Youtube, ChatGTP, Books.
I was able to create the following framework or flow chart to use in my Data Science Projects. This could be seen as the biggest take away for me. It serves as a guided step for me to develop this type of project in the future as well. Keep in mind that this process is iterative, meaning you could go back and forth as required by the analysis. Also the time spent at each step could vary from project to project.
The competition results in the end, helps to judge the experience overall, but the learning is much more deep.
My framework for Data Science

2.2- Get another certificate
I did not follow this path, but if I was, this would be my choice: IBM Data Science Professional Certificate on Coursera. This certificate will take you into a 10 course series covering the following topics:
- What is Data Science?
- Tools for Data Science
- Data Science Methodology
- Python for Data Science, AI & Development
- Python Projects for Data Science
- Databases and SQL for Data Science with Python
- Data Analysis with Python
- Data Visualization with Python
- Machine Learning with Python
- Applied Data Science Capstone
3.Machine Learning Credentials
After this project-based learning experience, I decide to get some credentials to add to my Linkedin profile. At this point, one may ask why go for a certificate and a project-based learning experience. I believe the certificates will help you get interviews for jobs, cause people will compare resumés. But for doing well in an interview you need real knowledge, that is built with real projects as done in a project-based learning experience.
The only difference is that I will go for more advanced certificates and not just basic one. These are my choices:
- IBM Machine Learning Professional Certificate
- DeepLearning.AI TensorFlow Developer Professional Certificate
4.Data Engineering:
As I write this blog post, I realize that it´s been a while since I worked with databases, so I believe it is a good idea to also refresh my knowledge in new technologies in this area. I have the basic knowledge like tables, keys, SQLs but I am not familiarized with Big Data tools and working with Big Data is also a very important topic in data science. :
I found this Professional Certificate in Coursera which seems to be a good reference for someone trying to learn big data:
5. Domain Specialization + Portfolio Development
My career objective is to work as a financial data scientist, specializing in the development of investment strategies, risk management, and portfolio optimization within the realm of quantitative funds. This is the domain specialization I am looking for.
Now I need to develop a portfolio in order to show my work and my knowledge. Again the “project-based approach” will be used to develop my portfolio. By working on some projects related to the Quantitative Finance area I will develop my domain knowledge skills and at the same time develop a financial data science portfolio.
To help me in this process I found a mentor, a person that already works in this area, to help me direct my work/portfolio to relevant projects and problems of the quantitative finance world.
The results will be found in this blog, mostly in this section PORTFOLIO.
Hopefully my journey will inspire other people!