Ask Your Question

LandscapeClassifier and Geographic Locations

asked 2012-08-20 09:09:33 -0500

Arenzky gravatar image

Kathrine, the answer explained many things, but I hope I am not getting repetitive at this point. Let me try to have a more detailed explanation:

I have a trained image dataset: trees-folder, roses-folder and house-folder. I have an test image dataset: folder1, folder2, folder3, folder4, folder_5

I know from my research that images in the five folders belong to those object classes (trees, roses, houses), but they are disordered. For example I do not know how many trees, roses or houses are in folder2. This step is very important, because

  1. the images in the folders (1-5) are grouped according a specific geographic location and I want to know how many object classes are present in a specific geographic location. Answering the question on how many trees, roses or houses are present in folder4, discloses information on the geographic location represented by folder4.

  2. the task of identifying objects in the folders was (at the beginning of the research) empirically defined by an expert (a human being) for each of the test folders. In other words I want to be SimpleCV my expert and therefore enabling me to (partly) automate this classification process. Ideally in this way:

             trees  roses  houses

Just a suggestion, as you explained in your answer, has it sense to label each folder in test dataset in the same way as the trained dataset, regardless of the content of the folder? I mean, for me it is not important to know how the folder is labeled, its just important to keep track on how I re-labeled my folders. Can this be a sort of work around?

                          trees  roses  houses
trees (former folder_1)
roses (former folder_2)
houses (former folder_3)
and so on...


edit retag flag offensive close merge delete

1 Answer

Sort by ยป oldest newest most voted

answered 2012-08-20 11:03:16 -0500

kscottz gravatar image


I will try get you a more detailed response soon, but here is the tl;dr. Basically, you are thinking about the test folders all wrong. Testing is simply for that, testing. I think the gap in your knowledge is that you need to realize that roughly machine learning has a few steps:

  1. Define the problem
  2. Collect data and label it (if you are doing a supervised problem). You need two sets, train and test.
  3. Define the features that you will pull from your images (color, shape, texture...)
  4. TRAIN classifiers -- you should really look at multiple kinds. Weka is a good tool for this.
  5. TEST your classifiers.
  6. Perform a trade-off analysis between your classifiers
  7. ITERATE ON 3-5 until you reach desired performance.
  8. DEPLOY classifier in your application. -- This is what you seem to want to do

When you want to deploy your classifier, which is what you want to do in this case, everything is up to you. Your testing data directories should be set up exactly like your training data. Like I said before, testing is a necessary step to ascertain how your classifier performs on real data. Once you have mastered testing you proceed to deploying your code to do whatever magic you set out to do. For your case the rough design pattern you want to use is:

  2. Load your classifier.
  3. Iterate through your folders or files (the glob library helps with this). Extract your feature vector.
  4. Call your classifier's classify method on the feature vector. This should return a string that is the class label (i.e. tree, rose...).
  5. Optionally aggregate the classifications according to what you want to do (i.e. toss all your classes into a list or something for each folder or plot it with matplot lib).
  6. Calculate whatever statistics you want to do / insert your secret sauce/magic here.
edit flag offensive delete link more

Your Answer

Please start posting anonymously - your entry will be published after you log in or create a new account.

Add Answer

Question Tools


Asked: 2012-08-20 09:09:33 -0500

Seen: 181 times

Last updated: Aug 20 '12