The screenshot below shows a section of this matrix in which I can see some variables with very high correlations with Suburb. I’ve decided to remove many of these highly correlated variables (Postcode, Council Area, Regionname, Banded Distance) as they’re giving roughly the same information as the Suburb does. I’ve decided to keep cyprus mobile number example the distances to public transport as these may be variable within a suburb and may have an effect on prices (e.g. to test whether, within a given suburb, equivalent properties that are closer to public transport may be more expensive).
Using this information, I have then created a profile to see which variables are most highly predictive of whether a property is in the top 10% of all property sales prices in order of predictive power and shows a couple of immediate points of interest:
Suburb is the most important predictive variable that determines whether a property is expensive.
Weather on the date of sale has no impact on expensive property prices.
The Apteco Datathon: 4. The property market in Melbourne
Within each variable we can expand to see the particular categories that are important. In the screenshot below we can see the suburbs that are most highly over-represented within the expensive property segment.