The csv file is considerably smaller than the tif file, so let's begin with that. The first thing we need to do is convert it from a simple csv to geojson in order to be able to map it.

The tif file is very big - around 21GB when unzipped. This is too big for most machines to handle in Python, and you would be better off using a decidicated mapping tool like QGIS. As such, we should rather look to use the csv file, which is much smaller. However, first we should confirm that the data is the same. To do that, we select only a small part of the tif file and compare it with the same area in the csv file.

with rasterio.open('zip:///%szaf_women_2020_geotiff.zip!/zaf_women_2020.tif'%data_path) as src:  
  n = 500
  data = src.read(1, window=Window(0, 0, n, n))
  print('Band1 has shape', data.shape)
  height = data.shape[0]
  width = data.shape[1]
  cols, rows = np.meshgrid(np.arange(width), np.arange(height))
  xs, ys = rasterio.transform.xy(src.transform, rows, cols)
  lons= np.array(xs)
  lats = np.array(ys)
  print('lons shape', lons.shape)
Band1 has shape (500, 500)
lons shape (500, 500)

Now we have a subset of the tif file, but in order to be able to compare it with the csv file, we need to convert both into geopandas dataframes. To do that, we need to change the shape of the tif file's data.

lons2 = lons.flatten()
lats2 = lats.flatten()
data2 = data.flatten()

data_df = pd.DataFrame({'pop_dens':data2,'latitude':lats2,'longitude':lons2}).dropna()
data_gdf = gpd.GeoDataFrame(data_df.pop_dens, geometry=gpd.points_from_xy(data_df.longitude, data_df.latitude, crs="EPSG:4326"))

lon_min= data_df.longitude.min()
lon_max = data_df.longitude.max()
lat_min= data_df.latitude.min()
lat_max = data_df.latitude.max()
gdf_box = gpd.read_file('%szaf_women_2020_csv_geo.geojson'%data_path,bbox=(lon_min, lat_min, lon_max, lat_max))

Right, let's take a look at the subset of the tif data. We can also zoom in to see that each point in the tif file is a square pixel.

plt.style.use("Solarize_Light2")
data = np.nan_to_num(data)
fg_color = 'black'
bg_color = 'white'
fig = plt.figure(figsize=(20,10))
ax1 = fig.add_subplot(121)
plt.imshow(data)
ax1.set_title('Plot of tif file', color=fg_color)

# set figure facecolor
fig.patch.set_facecolor(bg_color)
# set tick and ticklabel color
ax1.axes.tick_params(color=fg_color, labelcolor=fg_color)

ax2 = fig.add_subplot(122)
plt.imshow(data)
plt.xlim(150,180)
plt.ylim(180,150)
ax2.set_title('Zoomed', color=fg_color)
# set tick and ticklabel color
ax2.axes.tick_params(color=fg_color, labelcolor=fg_color)

# set imshow outline
for spine in ax1.axes.spines.values():
    spine.set_edgecolor(fg_color)   
for spine in ax2.axes.spines.values():
    spine.set_edgecolor(fg_color)    
fig.tight_layout()

Now let's compare the two geodataframes of the csv and tif subsets. Once again, we show a zoomed in image on the right. Bearing in mind that the tif data has been converted from a pixel to a point, we note that the two datasets place their points on opposite corners of the pixel. This is not a concern as both points represent the same pixel.

fg_color = 'black'
bg_color = 'white'

plt.style.use("classic")

fig = plt.figure(figsize=(20,10))
ax1 = fig.add_subplot(121)
gdf_box.plot(ax=ax1,label='csv')
data_gdf.plot(ax=ax1,color='red',alpha=0.8, markersize=7,label='tif')
ax1.set_title('Points from both datasets',color=fg_color)
# set figure facecolor
fig.patch.set_facecolor(bg_color)
# set tick and ticklabel color
ax1.axes.tick_params(color=fg_color, labelcolor=fg_color)

plt.legend()
ax2 = fig.add_subplot(122)
gdf_box.plot(ax=ax2,label='csv')
data_gdf.plot(ax=ax2,color='red',alpha=0.8, markersize=7,label='tif')
ax2.set_xlim(16.04,16.048)
ax2.set_ylim(-22.050,-22.044)
#ax2.set_ylim(-22.2044,-22.050)
ax2.set_title('Zoomed',color=fg_color)
# set tick and ticklabel color
ax2.axes.tick_params(color=fg_color, labelcolor=fg_color)
plt.legend()
fig.tight_layout()

We have now determined that the csv and tif data refer to the same thing. As such, it is safe for us to continue our analysis using only the csv data.