domingo, 7 de outubro de 2018

“The Big Risk of Artificial Intelligence Are Too Stupid Machines”

In the interview with the title “The Big Risk of Artificial Intelligence Are Too Stupid Machines”, that you can access here,  the author envisions a world in which computers may be able to learn everything.

Reading the interview, he done to Publico newspaper I remembered using Phyton to count words, classifying them, and thereby assessing the relevance of a document to study content.
I had read some articles with studies in that sense, using Phyton. There are software that have this purpose, to evaluate by the number of words most relevant, to verify how much the article, or interview, are a good starting point for a study on a theme.

I used the anaconda platform, with Jupyter Notebook and Phyton to analyze the up-and-coming words from the interview.

All work you can read in:

terça-feira, 2 de outubro de 2018

House Prices: Advanced Regression Techniques

Development Kaggle Competition

Manuel Robalinho - Set 2018 
Work developed to my Master Degree.

Using Anaconda framework, Phyton and Jupyter Notebook to predict house prices.

All work here:

Start here if...

You have some experience with R or Python and machine learning basics. This is a perfect competition for data science students who have completed an online course in machine learning and are looking to expand their skill set before trying a featured competition. 

Competition Description

Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence.
With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

Statistical about Autonomous Vehicles

Using Machine Learning to analise Statistical information about Autonomous Vehicles

By Manuel Robalinho on 2018/09

Work made for my Master Degree.

The state-of-the-art documentation of the implementation of autonomous vehicles has information that I thought was important I was creating an excel sheet with the projects in progress, partnerships, country where they occur, possible investment values.

I share the file I created with you: Search_VATs.xls
I took advantage of the information and using Python and Jupyter Notebook I created some charts with the excel information.

I had some difficulty in classifying projects funded by the European community, and I used the testing country, or the country where the company is involved in research and development of the project. Many projects are executed in partnerships, usually including software companies and automobile industry. In these cases the classification was obtained by the software developer.

Other cases being universities and companies of the automobile industry. No In this case they were classified by the origin of the University.
One unexpected case was finding 3 projects in Brazil, led by universities.
Another interesting example was to find China, Korea and Japan as a strong 3 countries in the investigation and implementation of autonomous vehicles. I believe that the high population of the Asian continent will lead the automotive industry to consider the implementation of these projects as an urgent matter.

The creation of many graphics was a way to made experiences about project the visual information, with many forms, using Phyton and pandas library, and matplotlib.
Using geographic information from libraries geopandas, and the coordinates available in geopandas datasets  ('naturalearth_lowres') I made the join the world map with the country of my excel information. I plot the result of this join:

The strongest colors represent countries with more projects.
Plotting the continent names and the bar graphics bellow:
Or plotting only a big world map with the most representative countries with autonomous vehicles projects.

In another reading i found some documentation from KPMG ‘KPMG-Autonomous-Vehicle-Readiness-Index’.  I liked the content and information presented, that describe the countries about openness and preparedness for autonomous vehicles.

I made an excel table with the information presented in the document, and 
used the same technique described above to make some graphics, that I present now:
For me the main novelty was to have Netherland as the most prepared and developed country for the implementation of autonomous vehicles. Perhaps because we heard a lot about the projects in the USA, Germany or Japan, I did not know all the preparation that this country has already developed in this goal. Another country that it’s a surprise for me it’s the good position of Singapore and United Arab Emirates. I don’t have knowledge about projects developed in these countries.
In the two graphics it’s notorious the bad position occupied by big countries like Brazil, India, Mexico and Russia.

In that graphics we can confirm the good technologic and innovation ranking occupied by United States and Germany. In the bad positions the same countries Brazil, India, Mexico and Russia.

In this chart is interesting the good position of Brazil and the bad positions of Japan or Spain.
This chart confirms the statistics and rankings for technological development and innovation. Countries where there are cells with projects in the area, are more technologically developed.
This graph shows the inversion of some countries in the ranking, but the values ​​of connectivity are also very similar among most countries. In the tail of the ranking we have the same countries that occupy it in practically all the graphs: Brazil, India, Mexico and Russia

Master graphic by country with the most important scores.
The darker colors represent the countries with the best scores.

sábado, 15 de setembro de 2018

Usando Informação Geográfica com Geopandas e Python

Imprimindo informações estatísticas sobre mapas geográficos.
Utilizando Python e Geopandas com informações de Censos2010 sobre Portugal.
Impressão de Mapas de estradas e rios com o Geopandas.

Mapas Geograficos com informação estatistica

Excelente plataforma com Mapa Mundial, para fazer o seu gráfico Geográfico e imprimir as suas estatisticas.

sábado, 18 de agosto de 2018

My Trail - Canyons Portugal

My Trail - Canyons Portugal

Developed on 2017 by Manuel Robalinho and other group colleagues from Master Degree on Software Developer

Look at the link:

Predicting house prices on Kaggle

Predicting house prices on Kaggle

Work developed by: Manuel Robalinho at 21-7-2018

Ask a home buyer to describe their dream house, and they probably won't begin with the height of the basement ceiling or the proximity to an east-west railroad. But this playground competition's dataset proves that much more influences price negotiations than the number of bedrooms or a white-picket fence. With 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home.

The Ames Housing dataset was compiled by Dean De Cock for use in data science education. It's an incredible alternative for data scientists looking for a modernized and expanded version of the often cited Boston Housing dataset.

Query Documentation

See work here:

Jupyter notebook page here:

domingo, 24 de junho de 2018

As Métricas Mais Populares para Avaliar Modelos de Machine Learning

Durante o processo de criação de um modelo de machine learning nós precisamos medir a qualidade dele de acordo com o objetivo da tarefa. Existem funções matemáticas que nos ajudam a avaliar a capacidade de erro e acerto dos nossos modelos.
Tão importante quanto saber escolher um bom modelo, é saber escolher a métrica correta para decidir qual é o melhor entre eles.
Existem métricas mais simples, outras mais complexas, algumas que funcionam melhor para datasets com determinadas características, ou outras personalizadas de acordo com o objetivo final do modelo.
Ao escolher uma métrica deve-se levar em consideração fatores como a proporção de dados de cada classe no dataset e o objetivo da previsão (probabilidade, binário, ranking, etc). Por isso é importante conhecer bem a métrica que será utilizada, já que isso pode fazer a diferença na prática.
Nenhuma destas funções é melhor do que as outras em todos os casos. É sempre importante levar em consideração a aplicação prática do modelo. O objetivo deste artigo não é ir a fundo em cada uma delas, mas apresentá-las para que você possa pesquisar mais sobre as que achar interessante.

Data Scientist | Machine Learning Consultant | Kaggle Grandmaster

Google’s AutoML will change how businesses use Machine Learning

Google’s AutoML is a new up-and-coming (alpha stage) cloud software suite of Machine Learning tools. It’s based on Google’s state-of-the-art research in image recognition called Neural Architecture Search (NAS). NAS is basically an algorithm that, given your specific dataset, searches for the most optimal neural network to perform a certain task on that dataset. AutoML is then a suite of machine learning tools that will allow one to easily train high-performance deep networks, without requiring the user to have any knowledge of deep learning or AI; all you need is labelled data! Google will use NAS to then find the best network for your specific dataset and task. They’ve already shown how their methods can achieve performance that is far better than that of hand-designed networks.
AutoML totally changes the whole machine learning game because for many applications, specialised skills and knowledge won’t be required. Many companies only need deep networks to do simpler tasks, such as image classification. At that point they don’t need to hire 5 machine learning PhDs; they just need someone who can handle moving around and organising their data.

New in TensorFlow 1.4: converting a Keras model to a TensorFlow Estimator

TensorFlow’s 1.4 release brings many new features — one of our favorites is support for converting a Keras model to a TensorFlow Estimator via the model_to_estimator() method.
Why would you want to do this? By wrapping your Keras code in a Estimator, you can serve predictions using TensorFlow Serving or deploy your model on Cloud ML Engine, a managed service for training and serving your TensorFlow models at scale. Using a TensorFlow Estimator, you can also take advantage of distributed training on your own cluster.
In this post, we’ll update the code we wrote in the article building a text classification model with Keras. If you haven’t read that blog post, we used Stack Overflow data from BigQuery to train a model to predict the tag of a Stack Overflow question. To jump to the code, find the full Jupyter notebook for this blog post here.

By Sara Robinson and Josh Gordon, Developer Advocates

Bibliotecas para Ciência de Dados

 O Cientista de Dados Igor Bobriakov publicou um excelente post no site Data Science Central (ponto de encontro de Cientistas de Dados em todo mundo) sobre as principais bibliotecas Python para Data Science. O post original está em inglês, mas trazemos aqui para você a versão em português. Confira as Top 20 Bibliotecas Python Para Data Science.

A Linguagem Python continua a assumir posições de liderança na solução de tarefas e desafios em Data Science. A seleção das bibliotecas está separada por categorias e a maioria delas já são estudadas no curso gratuito Python Fundamentos Para Análise de Dados.

Bibliotecas Principais e Estatísticas

1. NumPy
2. SciPy
3. Pandas


6. Seaborn
8. Bokeh 
9. Pydot

Machine Learning

11. XGBoost LightGBM / CatBoost 
7. Plotly
12. Eli5

Deep Learning

13. TensorFlow 
14. PyTorch
15. Keras

Distributed Deep Learning

Processamento de Linguagem Natural

17. NLTK 
18. SpaCy
19. Gensim

Data Scraping

20. Scrapy 

Ao lado do nome da biblioteca você encontra o total de commits no Github e o total de colaboradores.

Referencia: MEDIUM
Traduzido do original em inglês:

sábado, 10 de março de 2018

O que o Naufrágio do Titanic nos ensina até hoje— Data Science Project

O que o Naufrágio do Titanic nos ensina até hoje— Data Science Project

Excelente post, num passo a passo para um projeto de Exploração de dados.
Será que a classe mais alta teve mais chance de sobrevivência que a mais baixa? Mulheres e crianças sobreviveram? Vamos responder essas e outras perguntas utilizando dados reais
Quando os tripulantes embarcaram no famoso RMS Titanic em Abril de 1912, é muito provável que eles não esperavam pelo fim trágico do transatlântico que partiu da Inglaterra em direção a Nova Iorque. O gigantesco navio que demorou quatro anos para ficar pronto, afundou nas águas do Oceano Atlântico em 14 de Abril de 1912 quando colidiu as 23h40 
contra um iceberg.

Paulo Vasconcellos

Data Scientist with GIFs | Data Science & Analytics 

Brasil é 18º em ranking de nuvem

Brasil é 18º em ranking de nuvem

O Brasil subiu quatro posições em ranking global que avalia as políticas relacionadas à computação em nuvem das 24 nações líderes no mercado de TI.

Júlia Merker
// quarta, 07/03/2018 10:53

Confira o ranking completo:
Estados Unidos
Reino Unido
África do Sul

Júlia Merker
// quarta, 07/03/2018 10:53