# LandscapeClassifier and Geographic Locations

Kathrine, the answer explained many things, but I hope I am not getting repetitive at this point. Let me try to have a more detailed explanation:

I have a trained image dataset: trees-folder, roses-folder and house-folder. I have an test image dataset: folder1, folder2, folder3, folder4, folder_5

I know from my research that images in the five folders belong to those object classes (trees, roses, houses), but they are disordered. For example I do not know how many trees, roses or houses are in folder2. This step is very important, because

1. the images in the folders (1-5) are grouped according a specific geographic location and I want to know how many object classes are present in a specific geographic location. Answering the question on how many trees, roses or houses are present in folder4, discloses information on the geographic location represented by folder4.

2. the task of identifying objects in the folders was (at the beginning of the research) empirically defined by an expert (a human being) for each of the test folders. In other words I want to be SimpleCV my expert and therefore enabling me to (partly) automate this classification process. Ideally in this way:

         trees  roses  houses
folder_1
folder_2
folder_3
folder_4
folder_5


Just a suggestion, as you explained in your answer, has it sense to label each folder in test dataset in the same way as the trained dataset, regardless of the content of the folder? I mean, for me it is not important to know how the folder is labeled, its just important to keep track on how I re-labeled my folders. Can this be a sort of work around?

                          trees  roses  houses
trees (former folder_1)
roses (former folder_2)
houses (former folder_3)
and so on...


thanks!

edit retag close merge delete

Sort by » oldest newest most voted

Howdy.

I will try get you a more detailed response soon, but here is the tl;dr. Basically, you are thinking about the test folders all wrong. Testing is simply for that, testing. I think the gap in your knowledge is that you need to realize that roughly machine learning has a few steps:

1. Define the problem
2. Collect data and label it (if you are doing a supervised problem). You need two sets, train and test.
3. Define the features that you will pull from your images (color, shape, texture...)
4. TRAIN classifiers -- you should really look at multiple kinds. Weka is a good tool for this.
7. ITERATE ON 3-5 until you reach desired performance.
8. DEPLOY classifier in your application. -- This is what you seem to want to do

When you want to deploy your classifier, which is what you want to do in this case, everything is up to you. Your testing data directories should be set up exactly like your training data. Like I said before, testing is a necessary step to ascertain how your classifier performs on real data. Once you have mastered testing you proceed to deploying your code to do whatever magic you set out to do. For your case the rough design pattern you want to use is:

1. TRAIN AND TEST YOUR CLASSIFIER -- IF IT WORKS WELL ENOUGH SAVE IT AND PROCEED
3. Iterate through your folders or files (the glob library helps with this). Extract your feature vector.
4. Call your classifier's classify method on the feature vector. This should return a string that is the class label (i.e. tree, rose...).
5. Optionally aggregate the classifications according to what you want to do (i.e. toss all your classes into a list or something for each folder or plot it with matplot lib).
6. Calculate whatever statistics you want to do / insert your secret sauce/magic here.
more