Welcome to the GovHack Handbook!
If you’re participating, make sure you take a look at the Event Roadmap before you get started and check out our video editing tips to prepare for a perfect GovHack submission
Check out the toolkit section if you’re looking at our top picks for exploring and analysing datasets, creating and hosting web apps, or exploring maps!
General Competition Information
Australian competitors – times in AEST
- Wednesday (12th Aug) midnight: Treasure Hunt Opens
- Friday 6.30PM: Opening Ceremony
- Friday 7.00PM: Competition Starts
- Saturday 7.00PM: Trivia Event
- Sunday 4.30PM: Closing Ceremony
- Monday (24th Aug) midnight: Treasure Hunt Closes
New Zealand competitors – times in NZST
- Thursday (13th Aug) 2am: Treasure Hunt Opens
- Friday 6.30PM: Opening Ceremony
- Friday 7.00PM: Competition Starts
- Saturday 9.00PM: Trivia Event
- Sunday 4.30PM: Closing Ceremony
- Tuesday (25th Aug) 2am: Treasure Hunt Closes
We recommend you review the Handbook as it will help you:
- Compete → in the event
- Create → your awesome hack
- Submit → your project for judging
There are also a bunch of tools and guidance available to help you at each stage of the competition.
You have only approximately 46 hours to get your entry completed. Here is a guide as to how you might want to allocate your time. This is only a guide, and you can do whatever you want to create and submit your entry.
You MUST have all parts of your competition entry submitted by Sunday 5:00pm NZST for New Zealand and AEST for Australia time in Hackerspace.
- Every team member registered in hackerspace
- A descriptive project page
- A Team Captain
- Award challenge nominated
- Official Open Data reused – URLs
- Evidence Repository URL
- Video URL – maximum 3 mins
- Demo URL (Optional but recommended)
At 7pm on the Friday night the competition categories are launched and your team can start creating. The first night is all about working together in your team to create an idea.
- Find a spot to set up and make your own for the weekend.
- Talk to mentors before they head home! They know the data and will have great ideas to get your started.
- Head to Hackerspace and register as a user (all team members must register). One person from the team creates a team page. This will unlock all the award categories you can enter at your location including International, National, State and Local awards.
- Spend some time reviewing:
- awards to identify some common themes of award categories
- the Official Data list which includes featured data from sponsors and open government data portals
- Identify the focus of your investigation and project
- Check if these award challenges have data requirements.
- Assign roles within your team – working with each other strengths.
- Deconstruct the award challenge
- Think outside the box – entries can be anything a game, an art installation, a visual display, a story, a gadget,a board game, analytical models, a data vis and of course some great apps. Is there new technology you want to try or perhaps there is a tool or insight you think will help government.
- Judges reward originality and ideas that make data accessible to understand.
- design thinking
- What pain points could you resolve
- What other data could be relevant to the pain points or users?
- Mind map your ideas
- Check the Judging criteria
- Rule out data that needs too much work or start engaging mentors for help
- Create an evidence repository and add this URL to you Hackerspace project page. Start adding content even if it just photos of your workings for now (what is an evidence repository? Github for the techies and Google folder or similar that you can share via a URL for non-techies).
- Record URLs of datasets you use for submitting on your project page.
- Ask for help 🙂 There are mentors, coaches, crew and other Hackers who can help.
Aim to lock down your concept by 10am Saturday.
Mentors and coaches are available on site and via Slack.
- Review your team plans and assign tasks.
- It’s ok if you don’t have an idea yet – mentors will have loads of ideas and problems they want solved.
- Road test your ideas with mentors and pick their brain. Road test ideas with crew.
- Ask your GovHack crew to help connect with mentors for ideas if you need.
- Consolidate your many ideas into one or two good ones
- Data data data – how will you use it, mash it, interpret it, present it?- remember to record URLs of data in your Hackerspace project page as you go.
- Start a storyboard of how you will communicate your ideas
- What could you create/prototype/design/model that is achievable and will help people understand the concept?
- Take some photos of your team or media that will help in your Video entry
- Keep building
- Run your storyboard past some GovHack crew.
Last year’s competitors will all tell you the same… “it took me all arvo to create my Vid and then we had loading problems… Aggghh Panic!” On average it takes about an hour to load videos on YouTube and new technology when you’re stressed takes twice as long as you want… so factor these elements into your days plans.
- Set tasks and activities for the day
- Research tools you will need for the day – check the Handbook
- Finalise your storyboard script. What material will feature in your video? People? Places? Prototypes? Data? – what are the key points or features you want judges to understand?
- Build, prototype or mock up items that will feature in the vid to demonstrate your concept.
- How will you feature the data?
- Update your project page with your data story and datasets
- Script for your video
- Arrange for a quiet space to record any audio – aim for midday
- Craft and edit your 3 minute pitch video
- Apply audio
- By 2pm you should be in editing mode for your video
- Get your team page completed to meet all entry criteria
- Aim to start loading your vid to YouTube (or similar) by no later than 4pm
- YouTube gives you a URL link as soon as you start loading your vid – so make sure you grab this and enter it on your project page
- Finish and submit entry by 5pm.
In order to get some sleep and quality coding time, you may want to consider organising your team into shifts, so that while some are working, others can go home and rest, and then take over to allow the previous shift to get some rest.
Don’t forget to look after yourself: take breaks, eat, drink and go for an occasional walk. Allow some time to get away and freshen up. Showers clear the mind!
- Check the Eligibility criteria on each award category – some award require you to use data from a specific data portal or dataset.
- Don’t try to develop concepts that win every award, focus on complementary award categories.
- Good ideas need sleep – aim for 7 hours minimum on the Friday night as Saturday will be a long day!
Prepare your video
- An important part of your project is the 3 minute video showing your hack in action which you’ll make to show off your project to the competition judges.
- You are encouraged to include your team name, event location, team members, and to talk about the data you have used and your data reuse story.
- Videos can be a maximum 3 minutes long.
- Your video should not take more than a few hours out of your weekend if you keep it simple.
- We recommend you make a storyboard to help plan you video
- The preferred method is to use a screen capture with a voice-over narration explaining your hack, why you created it, and what is being show in the video. Remember that the judging panel is viewing the videos in isolation and may not have any context around your project.
- You may mix in other elements with the screencast, such as footage demonstrating the issues your hack addresses, interviews, live action material and actors (ie team members and willing friends) youve filmed, etc but be aware that videos that dont primarily focus on showing off the hack itself will not be as useful as ones that do.
Storyboarding, screencasting, and editing
- To help with storyboarding your video grab this huge pack of free story boarding illustrations.
- For screencasting check out software like ActivePresenter or OBS that will allow you to record demos of your application. If you have a mac, Quicktime will do the job.
- For mixing clips together the YouTube Video Editor is super user-friendly; though VLC or LWKS may also be handy. iMovie on a mac or Movie Maker on Windows also work well.
- Videos can be uploaded to YouTube, Vimeo, etc.
Watch this for some hints and tips on creating a GovHack video entry,
Many experienced video producers will tell you that audio can be more important than the visuals in a video. Make sure you give some thought to how you can do this part well.
- Find a quiet place to record audio.
- Scripts often need to be recorded several times to fit with the video footage and edited accordingly.
- Music is a nice touch but make sure you are licenced to use the content
Submitting your video
- Make sure you allow at least 1 hour to submit your video before the deadline.
- As soon as you have the URL to the video, add this to your Project Page.
- Remember you must have the actual URL of the video on your project page and not a link to another website where you have embedded the video. This is an important element used to validate the video was submitted before the deadline.
There are over 2000 people who participate in GovHack in some way including volunteers, participants, government representatives, mentors, coaches and are fabulous GovHack event organisers. Participants hail from all walks of life and bring a mish mash of diversity and amazingness to the event!
We expect that you will be part of a team already, or will join a team at the start of the competition. You are allowed to compete as an individual, but we highly recommend you find other awesome people and join a team. There is a maximum size of 10 for a team. The best teams have a mix of skill sets. If you don’t have a team, find a local crew member who will help you meet other GovHackers looking for a team.
If you don’t have a team then can come to the GovHack Connections event in your state or territory to connect with other like minded participants to form a team. More details can be found in Connection Events. Slack also offers an opportunity to meet GovHack participants and to form teams.
We use Slack for event weekend communications and team forming. Register for a Slack account at slack.govhack.org. To revisit the Slack instance go to GovHackHQ.slack.com.
Wellbeing at GovHack
Taking care of yourself sounds pretty basic, but it’s surprisingly easy to forget over an intense 46 hours at a hackathon. Here are a few glaringly obvious things to bear in mind.
Try not to get too stressed. (It helps to set realistic expectations, and to focus on having something ready to demo – perfectionism and feature creep aren’t your friends at a hackathon. And don’t forget to back-up your work, to avoid any last-minute panic.)
Make sure you eat and drink regularly, and not just caffeinated drinks. Hydration is important.
Take breaks, go outside in the sunshine, tune out the world with headphones.
Meal breaks are the perfect time to take a real break and also make new friends or get fresh air
Remember to take any medication that you need.
Try to get plenty of sleep. (We don’t recommend being that person who works all night and doesn’t sleep. This recommendation may be based on real life experience.)
Photography, video and audio recording
GovHack is subject to extensive recording in video, audio and photographic form. By attending, you acknowledge and accept that you may be the subject of such recording, which may be shared through digital media in relation to GovHack events. We will seek to accommodate any specific wish not to be recorded, but cannot guarantee you will not be recorded in some form.
Questions, comments, mentions, and cat GIFs can be directed to the National GovHack Twitter account (@GovHackAU), or to your local GovHack event account.
Twitter will be the primary social media platform that will be used and monitored throughout the event.
Handles and Hashtags
GovHack @GovHackAU — see our social media page for more
Please also share your photos of the event through Instagram tagged #GovHack, Flickr tagged #GovHack, or other channels! A list of our handles/hashtags can be found on our social media page and on our newsroom.
Consult your local event for information on getting there on public transport.
Consult your local event for information on parking.
Security and building access
Consult your local event for information on venue opening hours.
Volunteers will be onsite for every hour of the event. After hours, the building will be locked. A phone number will be placed at the entry to the venue if you have trouble gaining access.
Neither the event organisers nor venue operators can accept responsibility for personal belongings left unattended onsite. If you don’t have a trusted person to look after your belongings, we recommend taking them with you if you leave the venue.
Occupational health and safety
OH&S refers to the policies, procedures, legislations and activities, which aim to protect the health and safety of people within a workplace. Specific ways to limit hazards to yourself or another person whilst participating in GovHack are listed below.
It is imperative that your health and safety is never compromised.
- If you have any existing injuries, inform a volunteer
- If you notice any hazards, report them immediately to a volunteer (e.g. water spillages)
- Minimise the risk of tripping by getting a volunteer to place gaffer tape over cords, securing them to the floor
- Place tables and electrical items close to the power outlets whenever possible
- Bend your knees when you lift
- If you start to shake, put on some warmer clothes and/or slow down on the coffee/Red Bull
Think before you lift!
Manual handling occurs when you are lifting, lowering, pushing, pulling, carrying, moving, holding, and restraining any person or thing. It’s unlikely you’ll have to lift anything heavy at this event – do you really need to be moving that? Check with a GovHack volunteer before moving anything larger than a laptop.
If a person is unconscious or requires an ambulance, immediately dial 000.
Details of emergency procedures will be introduced to participants during induction and on display within the venue. Make yourself familiar with these procedures at any time you’re onsite.
- Each team must nominate one person on their Hackerspace project page as their Team Captain and provide contact details. (GovHack does not accept joint Captains)
- The Team Captain will be the contact for GovHack organisers to coordinate distribution of awards and prizes after the event.
- Prize money must be evenly split between all team members of winning teams.
- If all members of your team are under 18 then please nominate a guardian or the Local Event Organizer who will facilitate the purchase of vouchers to be split winnings amongst the team.
- Team captain must be available to fly to represent their team at the National Red Carpet awards.
- Some local awards may be handed out on the Sunday afternoon, but most awards will be announced at state Awards nights or the National Red Carpet Awards.
- After GovHack, a limited number of participants who demonstrate real GovHack Spirit and a limited number of finalists will be chosen to fly to the awards. More details to be advised after the main event.
- The Red Carpet Awards present a great opportunity to celebrate all the clever projects from around the country with sponsors, agencies, media and some high profile special guests!
- Following the Red Carpet Awards, the captain of each winning team will receive an online prize claim form. This form will facilitate electronic funds transfer of the
- GovHack reserves the right to amend the value of cash prizes if event running costs exceed the total sponsorship collected
- Awards may be split between multiple teams.
- GovHack events are run by volunteers with the generous support of Sponsorship. Sponsorship funds are used to fund the amazing events you attend including GovHack Connections, GovHack competition weekend and the state and National awards nights. This includes the amazing food and beverages you consume over 46 hours!!
- Any changes to award and prize values will be communicated to winners.
- Unless otherwise stated all cash awards are administered by the GovHack Global Operations team.
- Development Awards are administered by the supporting sponsor. Your local event host will provide you with information on how to claim your award.
- Any vouchers or hardware detailed as an award will be issued at the relevant Awards night.
- GovHack reserves the right to not issue an award if Eligibility criteria of the competition or the prize category is not met.
We have put in place many tools to ensure that competitors competing remotely still have a great experience. As a digital competitor you will still need to be associated to a Region and join a team. Your region is where you are normally based. Being digital, for your team, you are not restricted to team members by region. You can be in (say) NSW and join a team in NT or Christchurch if you can find team members based there. At least one person from your team needs to be based in the Region that you choose for your team registration.
We will be using four primary tools: Youtube Livestream, Hackerspace, Slack, and Zoom.
Over the weekend there will be five livestreams that take place between Australia and New Zealand. There are two opening and closing livestreams, one for each Australia and New Zealand. These will take place on Friday and Sunday respectively. There is also a joint Australian and New Zealand Trivia Event livestream event that will take place on Saturday.
On our Youtube page you can Set Reminders for each livestream. Our page is https://www.youtube.com/user/GovHackAustralia
We encourage everyone to comment and post questions. Please feel free to share the livestream with anyone else that might be interested, even if they’re not participating in GovHack. The livestreams are public so don’t be shy about sharing on your socials. We are @GovHackAU.
The hackerspace website at https://hackerspace.govhack.org/ is where you will join teams, create projects, find challenges and datasets, and submit your projects.
In order to participate, you must create an account and register for a competition. You will also have a profile page where you can list your skills and interests. Your profile page is hidden by default. When you’re ready to make it public, ensure the checkbox is ticked within Update Profile, so that it is visible to other competitors. Use Hackerspace to search for team members based on skills, interest and region. To contact other competitors, go to their profile and click to message them on Slack.
We want our Slack to be a safe, inclusive community for everyone, and usage of Slack is covered by the GovHack Code of Conduct. By joining the Slack workspace, you are agreeing to adhere to the Code of Conduct which can be found at govhack.org/howtocompete/code-of-conduct/
We encourage you to find and join relevant Slack groups for your region and selected challenges. These are formatted as:
- #hack-[region] – (#hack-act, #hack-nz, #hack-hobart, etc) – here you can connect with hackers (competitors), volunteers, mentors and facilitators from your local region
- #talk-[organisation] – (#talk-aws, #talk-ato, etc) – here you can connect with specific mentors from a particular organisation
As a starting point, you may wish to join:
- #announcements – Important competition announcements
- #find-team – Find your team mates here
- #comp-help – Ask all your competition related questions: competition rules, team questions, clarify challenges, prizes
- #hackerspace-help – Help on using hackerspace
- #mentor-help – Ask for help on a specific dataset or challenge
- #tech-help – Ask for and give help to others on any of your tech related questions: prototyping tools, stuck on a bug, general advice
GovHack is a friendly, diverse and accessible. If you feel you can contribute to a channel with your experience and knowledge, please do so! We encourage you to help each other out if you see someone asking for help in Slack.
Each region of the competition will have a dedicated Zoom room where you can join in and talk to GovHack facilitators, other participants and your team. The Zoom rooms will have breakout rooms enabled and facilitators will allocate your team into a room that you can use throughout the competition weekend.
Mentors will also have a Zoom room for their organisation that you will be able to join to have face-to-face conversations with them. The first point of call with mentors will typically be via Slack.
Competition Hacks + Information
Hackerspace (opens on the Friday of the competition at 7pm) is the Official GovHack competition submission site and allows you to submit all components required for your team’s GovHack entry.
Note: submission elements and times are system controlled so no extensions are available! Teams are required to submit the following as part of their competition entry on Hackerspace:
- Register all Team members in Hackerspace
All team members must be registered as a user with their email. This ensures you get an invite to awards nights and so you can receive any awards you may win.
- A descriptive project page must be created for each project
- Team members and Team Captain
- Project Description
- An image that best captures your concept e.g a logo or image.
- Nominate Award Categories – nominate for multiple awards and from all levels of competition that you can view including International, National, Regional and Local awards. Check you fulfill any special Award category eligibility criteria such as a specific datasets.
- Nominate for Team awards available this year
- Record data used – For each significant dataset you use record the URL and explain how this data was used in your entry. For each award category you nominate make sure you check for any specific data reuse eligibility requirements and record the data. This field will be used to validate the eligibility of your entry.
- Evidence Repository URL (Mandatory) This is your proof of concept. You must provide evidence of your work over the weekend any code, graphics,plans, drawings, data analysis and cleansing, mashups, applications, website URLs, models, photos of each stage to create your artistic representations. Submit a link to a digital repository such as a Google Drive or Dropbox shared folder (public) or a Git repository such as public GitHub or public BitBucket
- A maximum 3 minute Video entry (Mandatory), must submit actual Video URL not link to another website. This a video pitch of your entry that tells a story of how you have reused data. The video should demonstrate your hack concept, the benefits or value the concept could achieve and where possible introduce your team. The most common method is to use a screencast, with a voice-over narration.
- Note: Judges will stop watching videos after 3 minutes
- Demo URL (Optional) If judges are able to see and play with it that is useful, but this is a minor component of the judging.
Timeframes to register and submit:
- 7pm Friday AEST or NZST – Challenges are announced for your region on Hackerspace
- 10am Saturday AEST or NZST – all competitors are recommended to register as a team on Hackerspace.
- 4pm Sunday AEST or NZST – Your video should be finalised and a URL linking to your video created to load on your Project page. It may take some time for your video to load once you have started the upload process.
- 5pm Sunday AEST or NZST – You MUST have all parts of your competition entry finalised before 5:00pm Local time which includes 1) your team page, 2) your data story description and detail of data sets used 3) your Project outcomes (demo’s, code, graphics, photos submitted, 4) challenges selected and justified, and 5) your video link uploaded.
You will have the ability to earn badges for your profile pages. The badges are a way for you to showcase your accomplishments.
Badges that can be earned in 2020 include
*= as judged by GovHack
For any questions not covered by this handbook, mentors and facilitators are your first point of call. If you are competing at a physical location, flag down one of the friendly facilitators. If you are a digital competitor, jump onto Slack and get in contact with your digital facilitator. Don’t be shy about saying hello or @ing people to ask for help. It’s a supportive environment and everyone is here to learn and help each other. If facilitators aren’t available or you have a specific question, post it into a Slack channel and there’ll be someone to assist.
Mentors can help in a variety of ways. Some can help you with a specific challenge or dataset, others for a specific technology or tool, and some are mentoring in a general capacity. You can reach out for a mentor through your facilitator or by heading to Slack. You can also browse through Mentor profiles in the Profiles section of Hackerspace.
These are the friendly GovHack faces who will answer any question – no matter how trivial or repetitive this may seem. They are your biggest supporters and want you to succeed. For competitors at a physical location, these are your event hosts – you’ll see them floating around the space. For digital competitors, you will be allocated a digital facilitator. Each team will receive one digital facilitator who will guide you through from start to end. To find a full list of facilitators, you can browse the Profiles section of Hackerspace.
What can I submit as a hack?
A Hack is when you take something and make it better! This is an open data competition so you need to reuse official open data in a clever and creative way.
Entries could include, art, jewellery, a digital sign, a board game, historic film pieces, a virtual reality game, internet of things (IOT), a digital sensor display, a 3D model, a visualisation of data, an informed article and of course some amazing web apps! We only limit you to your imagination.
Award categories can help provide inspiration to shape your ideas, as can mentors so don’t be afraid to tap into the resources on hand for ideas.
It’s only 46 hours, so many of concepts entered are prototypes, mock ups, models, smaller scale artworks and even engineering design concepts from our maker community. Of course we have had some very crafty teams deliver demos and live apps within the time frame.
Your proof of concepts should be showcased in someway in your project page and 3 minute video submission.
Using Open Data
You will find the list of Official data available for the GovHack competition in the Official Datasets list. You must use at least one Official dataset to be eligible for prizes. Official Data includes individual datasets featured and all data discoverable on official Government data portals that are published on the Official Dataset list.
You will need to Record the URL of the most prominent data you have used on your Hackerspace project page. Remember judges love to see their data reused 🙂
For each prize category you enter, please check the eligibility requirements to see if any specific data needs to be reused to meet judges criteria. To maximise your chances to win National and Region (State/Territory) awards we recommend you mash up National and Region official data. We limit the number of award challenges can you nominate for the National and Regional competitions to 5 each.
Some datasets listed on data portals may have additional resources available with further information on how to use the data or other supporting material. You are encouraged to download and use these resources.
Several competition goals require entries must use at least one of the datasets provided for this contest, but you are free to use data from the official GovHack list or other datasets as long as their licensing terms permit usage for this purpose. You may also use any publicly accessible web services as long as it does not incur a financial cost to use (private and subscription APIs are prohibited due to licensing issues and barrier to entry).
You can not use data you are the authority or a subject matter of… That’s not in the spirit of the competition.
The most important thing is to bring your GovHack Spirit of fun, friendship and helping others! GovHack is a friendly creative environment.
Tech and equipment
Most participants form teams and work together and allocate different tasks to different team members. For instance you may be great at ideating and creating a marketing pitch through a storyboard. Another team member may mine data for the concepts so needs a computer and someone else could focus their energy on elements of a winning video. So not everyone needs a laptop 🙂 Here are some of the things we’ve seen people using at GovHack.
- Mouse and mousepad
- Adaptors and power cables (some may be supplied)
- USB thumb drives, external hard drives
- Phone and charger
- Drawing tablet and stylus
- Bluetooth adapter
- USB hub
- Camera or video equipment – although not required
- Wi-fi dongle – not required as venue have wifi, however please remember there may be a high volume of uses at critical times.
- Identification or Proof of Age card
Please label your belongings so we can return them to you if you leave them behind.
Can I leave equipment at the venue overnight? We discourage this, however recognise some people like to have their monitor at the events. Please be aware that you do this at your own risk and GovHack accepts no responsibility for any possessions left unattended at the venue.
Makers and Artists
If you are interested in making things by hand or using digital fabrication like 3D printing or laser cutting, building physical computing, Internet of Things devices or robots, there are sure to be events for you. Artists making representations of art may also love these spaces. Please check the event page for more information about equipment and supplies.
The other stuff
GovHack is great fun, but it can also be an intense and stressful weekend at times. Bring what you need to stay productive and comfortable.
- Comfortable clothing
- Ugg boots, fuzzy socks
- A jumper
- Music or podcasts
- Any data you’ve downloaded for the event, or notes you’ve made
- Your favourite snacks and drinks (we’ll provide main meals and healthy snacks!)
- A water bottle
- Glasses, if you need them for reading screens
- Pen, paper, post-its, notebook, coloured markers, your stationery drawer
- Business cards
- Any medications you may need
- Your wallet and keys
Sleeping at the venue
Most venues do not stay open all 46 hours. For this reason you may want to check your event page for details of opening and closing hours before you pack the swag! We know a lot of regional nodes do support sleeping at venues.
Your local venue will provide free WiFi. Details for how you can connect to the WiFi will be provided at your event launch. Your WiFi usage, including content downloads, may be monitored as part of general venue security, so please use the access provided with respect and avoid any illegal behaviour. Please make sure the laptops or computers you bring can connect via WiFi, or that you bring a WiFi dongle. Hardwired connections are not available at venues. Please be aware that during peak periods (4 to 5pm Sunday) internet connectivity may be slow. Please plan accordingly.
At Official Events we’ll be taking care of your food while you’re at GovHack, so all you need to bring along are any snacks you want.
- Friday: Supper
- Saturday: Breakfast, lunch, dinner
- Sunday: Breakfast, lunch
If you’re attending a Node Event you may need to bring your own food. Most Node events have secured a small amount of sponsorship to contribute to catering or may help to organise a pizza order for everyone to chimp into.
Check out what type of event you are attending on the website. View more information about your venue catering on your locations event page.
Special Dietary needs
If you have any special dietary needs, let us know on the ticket registration form (or contact us, if you forgot to when you registered) We’ll do our best to take care of you. Vegetarian, vegan and gluten-free options will be available for participants that have advised us, but we do need to know numbers so we can make sure there is enough for everyone.
The organisers will have endeavored to accommodate a wide range of dietary requirements. If you have severe allergies or important dietary requirements make sure that you have provided that information on your ticket registration.
Catering is often donated by wonderful sponsors, accordingly at this time we are unable to offer or guarantee products are organic. Thank you for understanding
Need coffee… yes please!
Some locations have generous sponsors that have helped out with access to real coffee and hot beverages from a local cafe or coffee cart. Your host will let you know details at your launch event if this is available. Please drink responsibly 🙂 Venues will also have some basic kitchen facilities including tea, instant coffee and milk.
There are four simple requirements for your GovHack project:
- That you register your team and fill our your project information on the Hackerspace,
- That you submit a 3 minute video by the end of the competition,
- That you make your project source code and assets available online under an open source software license, and
- That you cite all the datasets that you use within your project.
It’s useful to bear in mind that the competition judges will be focused on the tangible outcomes of your project, so making your team page a snazzy and useful resource with information about your project, screenshots, your 3 minute video, and anything else that shows off how awesome your project is REALLY important 🙂
Register your project and team
Firstly, get one of your team to sign up and register your team on the Hackerspace. You should register your team and created your project by 5pm Saturday (local time), but you’re free to continue editing and improving it until the competition closes.
If you experience any issues with registering your team, or have any questions about what is required of you, seek out one of your friendly GovHack organisers and they’ll give you a hand.
Prepare your video
The second most important part of your project is the 3 minute video showing your hack in action that you’ll make to show off your project to the competition judges.
The preferred method is to use a screencast with a voice-over narration explaining your hack, why you created it, and what is being show in the video. Remember that the judging panel is viewing the videos in isolation and doesn’t necessarily have any context around your project.
You may mix in other elements with the screencast, such as footage demonstrating the issues your hack addresses, interviews, live action material and actors (read: team members and bribeable friends)! you’ve filmed, et cetera – but be aware that videos that don’t primarily focus on showing off the hack itself will not be as valued as ones that do.
You are encouraged to include your team name, event location, team members, and to talk about the data you have used and your data reuse story.
Check out the hacker toolkit for some assistance and instruction on how to make a compelling video. Remember: Your video should not take more than a few hours out of your weekend if you keep it simple
Storyboarding, screencasting, and editing
To help with storyboarding your video grab this huge pack of free storyboarding illustrations.
For screencasting check out software like ActivePresenter that will allow you to record demos of your application.
Videos can be uploaded directly to our S3 storage bucket via the HackerSpace (we’ll upload them all to YouTube later). You can use YouTube, Vimeo, et cetera as well.
And again, if you are unsure about what you need to do, or just need a bit of help with your video, hunt down one of your GovHack organisers and they’ll be happy to help.
Submit your project
The last tenet of GovHack is that you submit all of your source code and assets (data, documents, art assets, et cetera) and make it available under an open license (such as Creative Commons). Typically this will comprise the source code for a web or mobile application, but for other types of works (e.g. 3D printed jewlery) that can be your notes and evidence of your prototypes.
The key point to remember is that your source material needs to demonstrate to the competition judges that the end result was your own work, and that it is possible for another person, with the right knowledge and equipment, to replicate that work.
You’re free to submit your source materials in any fashion, but typically we find people like to use GitHub or BitBucket. Both of these services are free for open source projects and have user-friendly web and desktop applications to allow even novice users to create, submit, and edit their source material.
Making your hacking more productive
So by now you’ve got your project idea taking shape, and have probably thought a little about your hosting infrastructure, but how do you turn this idea into reality and what tools do you need?
Well, given the greatly compressed timespan of GovHack anything that can help keep you as responsive flexible – dare I say, agile. Use the physical resources you have to hand – pens, butchers paper, post-it notes, a whiteboard (if you can purloin one) to help give your project planning a tangible, physical presence.
Source control and issue tracking
We’re assuming that everyone already uses some form of source control system (Git, Mercurial, maybe SVN) already. If not, get thee to GitHub, GitLab, or Bitbucket and grab a copy of their respective desktop applications if your code editor doesn’t integrate that particular flavour of version control (it probably does).
We’ve lumped issue tracking for bugs, feature requests, research questions, et cetera in here as well just because almost all good hosted source control providers these days build in some sort of issue tracking functionality. No need to reinvent the wheel or go elsewhere!
Honestly, your best project management tool for GovHack is probably a whiteboard, or butchers paper blutacked to a wall, with different coloured post-it notes. It gives your an immediate, physical, tangible thing to get up and interact with, look at, scribble on, and easily rearrange that no digital system is going to come close to giving you.
You’ve all got your own favourite code editor or IDE, right? A hackathon is probably not the best time to learn a whole pile of new keyboard shortcuts, but if you’re looking for inspiration go and check out Atom, Orion, Sublime Text 3, and Brackets.
Curated awesome lists of awesomeness
We’re going to list a whole lot of tools and libraries in the rest of this document, but we’re so far from covering the full list of what’s out there. So if you’re after some tools for a particular programming language, platform, frontend or backend development, and so on check out the curated list of awesome lists (and try not to be too overwhelmed by awesome projects).
Data viz 101
Data visualisation encompasses a broad range of fields, techniques, and tools for creating visual representation of data for human consumption. The geographic and tabular data fields have rich toolsets for visualising their particular types of data, so keep on scrolling if you’re after some specific tools.
For now, read on for some of the theory behind data visualisation, some material to inspire, and lists of visualisation tools.
The theory of it all
The School of Data has a set of data visualization guidelines by Gregor Aisch that are worth a read.
Lastly, Juice Analytics has good roundup at Data Storytelling: The Ultimate Collection of Resources.
Resources for inspiring
And finally Avinash Kaushik’s post on Data Visualization Inspiration: Analysis To Insights To Action, Faster! uses six short stories of data visualisation done well to inspire.
Resources for building
If you’re not sure exactly what tool you’re after and like staring at lists of tools waiting for something to leap out at you then check these out!
- Visualising Data’s Essential Collection of Visualisation Resources
- Drawing By Number’s Visualisation Tools and Resources
- datavisualisation.ch’s selection of tools for visualisation
Web visualisation tools
We couldn’t mention data vis without giving a nod to D3.js (Data Driven Documents) for creating interactive and amazingly detailed visualisations – find out more about Why D3.js is So Great for Data Visualization. Bewarned though, the learning is quite steep as you’re starting out, but the web is full of thousand of D3.js examples that you should have no problems hacking into the shape you want (such as word clouds, real-time filtering of barcharts, and bubble trees for comparing sizes, and many, many more). Check out these couple of great tutorials Towards Reusable Charts and Data-Driven Documents, Defined.
Visualisation as a Service
If you’re playing with data vis on the desktop you’ll find a lot of the tools are commercial in nature, but Tableau is worth a look (as well as the School of Data tutorial Analysing Datasets with Tableau Public).
Bonus: Android native charting libraries
Intro to geographic data
Geographic data is any dataset that has a location element to it – usually provided as latitude and longitude coordinates – that describes a set of points, lines, or polygons, or a picture (raster) with other non-geographic attributes attached to them. A lot of datasets fall under the category of geographic data (aka spatial data) – from bus stops, postcodes, and cycle paths to polling places, satellite or aerial photography, and mineral deposits.
Google Maps may have popularised mapping, but actually working with the data that underlies a map requires some specialist tools and knowledge.
If you’re new to working with spatial data then we highly recommend reading Tom MacWright’s truly excellent mapschool: a free introduction to geo site. You can skim through it in about half an hour and get up to speed on the basics of spatial data, learn about the common data types, and likely pick up some knowledge that will save you a lot of frustration down the line.
Quick and dirty – just show me what the data looks like
The first thing you’ll probably want to do when you find data is to actually just quickly view it to see what it looks like, check if the data is what you thought it was, if the geographic distribution is about right, et cetera.
Well, there’s a couple of options.
For really quick and simple viewing you can drop most common sorts of spatial vector data on geojson.io and see a quick representation of it (as well as then exporting it back out to a different format). All of the processing is client-side though, so you might want to avoid giving it a huge or complex dataset. MapStarter is another similar service, though it only allows you to export the data as an image (or a simple web map).
Oh – and did you know that GitHub will render any GeoJSON files that you commit to your repo. Fun!
For any larger or more complex datasets QGIS is a great open source cross-platform tool for viewing any and every type of spatial data.
Converting between data formats
So you’ve found the dataset you want, but it’s in some bizarre and possibly arcane format (Shapefiles, MapInfo TAB file – I’m looking at you! [on behalf of the entire spatial industry I apologise for these two formats still existing]) and you want to convert it to something more developer friendly and modern (e.g. GeoJSON, CSV, KML).
For small datasets (< 10mb) MyGeoData will let you convert between most formats. For anything beyond 10mb you’ll want to reach for the GDAL command-line tools – GDAL is a fantastic open source project that has been embedded in a lot of the software in the spatial world. To translate vector data in GDAL reach for the ogr2ogr command (if you’re on Windows ogr2gui is available too), for raster (picture) data gdal_translate will convert almost anything to almost anything else.
If command-line tools aren’t your thing skip down a bit to the section on QGIS for a cross-platform GUI built on, amongst other things, GDAL.
Oh – and there are GDAL bindings available for Perl, Python, Java, C#/.NET, Ruby, and R. Scroll on down to the Spatial analysis section for more suggestions of libraries to use in your favourite language.
Geocoding – turning an address into coordinates
Your geocoding needs will likely fall into one of two categories: Needing to geocode an address provided by the user vs needing to batch geocode a set of addresses in a dataset.
The School of Data has two great introductory posts Geocoding Part 1: Introduction to Geocoding and Geocoding Part 2: Geocoding Data in a Google Docs Spreadsheet.
In the former case, your quickest and easiest option is to make use of the Google Geocoding API built on top of Google Maps. Examples are available of a simple geocoding call and an address search with auto-complete functionality. Caveat emptor – the Google Maps Terms of Service do require that the results of geocoding requests are displayed in some fashion on top of a Google Map and limits you to 2,500 requests/day.
There are some free / open source RESTful APIs for geocoding, which you could happily either wrap a UI around or issue batch requests to yourself. These include the MapQuest Nominatim Search API, the MapBox Geocoding API, and the GeoNames Search API.
If you’re after a more set-and-forget geocoding service that will geocode a whole file of addresses with having to fiddle with making your own API calls then take a look at CartoDB’s geocoding functionality – and Google Fusion Tables is still kicking around in “experimental” mode (tutorial here).
Lastly, the Python library geopy provides a convenient API wrapper around almost every geocoding service known to humanity.
Unless the spatial part of your project is only for window dressing you’re probably going to need to do some analysis between it and other datasets. For instance – you might need to group one of your spatial datasets (like public transport usage) by another (like suburbs) to generate some summary statistics on usage.
You could hack together some code yourself to work it out, but really there are some far better and far far more powerful options available to you.
PostGIS is an extension for PostgreSQL providing spatial capabilities for both vector and raster data. In spatial database-land it is unequalled in the sheer range of functions it makes available, their ease of use, and speed (it’s written in C).
Seriously, don’t waste your time with any other database.
Getting up and running is easy on any platform, with installers available for Windows,
brew install or Postgres.app on OSX, and packages available for all of the major Linux distros. For those inclined to Docker there are Dockerfiles available.
If you need more than
psql on the command-line, pgAdmin is available across all operating systems (and often comes bundled with PostgreSQL anyway).
Oh – and Amazon RDS for PostgreSQL comes with PostGIS already installed if for some reason you need that level of scalability.
MySQL? SQL Server? et al.
These are not the spatial databases you’re looking for…
Ok, fine – technically you do have spatial functionality in some of the other popular databases.
tl;dr Avoid MySQL for anything spatial!
PostGIS may give you the heavy lifting power to do analysis, but staring at database rows trying to make sense of your results can be made so much easier by visualising them. Enter QGIS – a free and open source cross-platform Geographic Information System with the ability to create, edit, visualise, analyse, and publish spatial information.
Thanks to being built on top of GDAL (amongst others) QGIS is capable of reading and writing almost any format of spatial data that you can throw at it – including direct connections to PostGIS databases.
Language bindings: R (Arrr!), Python, .NET, Ruby, et al.
If you need to delve down into working with spatial data at the code-level you’ve got a really rich set of tools at your disposal.
It’s not too much of an exaggeration to say that Python is the language for doing anything spatial. It has an incredibly rich array of good libraries – far too many to list here – for analysing and manipulating every kind of spatial data under the sun, as well as the means of connecting in to any flavour of spatial data store you care to throw at it.
And even if Python isn’t exactly your cup of tea it’s still very much worth a look if it can fit anywhere in your workflow.
On the raster side of the equation head straight to Rasterio.
There’s a more complete list of a bunch of other great Python spatial libraries over here that’s well worth a read.
We should mention – pretty much anything you can do here you can also achieve with the tools available in a GUI in an application like QGIS.
Ok, so we may have exaggerated Python being the only awesome language for spatial data. Java is almost equally as awesome as Python, with a similarly rich ecosystem of libraries and applications (GeoServer the popular spatial data server is primarily Java-based).
For everything and anything check out GeoTools – the Swiss Army Knife of spatial in Java-land for reading/querying/analysing/rendering vector and raster spatial data.
As a primer you should check out Starting Analysis and Visualisation of Spatial Data with R.
Surprise! There’s actually a great StackExchange question on this very topic. In addition to the resources listed therein, James Chesire has a great (and quite accessible) write-up on his blog at R Spatial Tips. Robin Edwards also has some great words and examples about 3D Mapping in R.
And there’s also spatstat if you want to delve down into spatial statistics and analysis.
You’ll find pretty reasonable support for spatial data in .NET-land with the likes of:
Geo – a powerful little .NET 4.0+ library for querying and manipulating vector data.
NetTopologySuite – a port of the aforementioned popular JTS (Java Topology Suite) library for querying and analysing vector data.
SharpMap – a geo app framework for vector and raster data that includes its own rendering engine.
MapWindow – an all in one desktop GIS tool + an ActiveX control for mapping + a C# library for handling vector data.
Daniel Azuma’s series of blog posts on doing geospatial in Ruby is going to be worth your time.
A few other tools
In recent times a few really handy and modern little web tools have popped up for doing simple and/or common tasks with spatial data.
geojson.io for quickly and easily creating, viewing, and sharing vector data as GeoJSON (and other common formats).
Ogre as a web client to the ogr2ogr utility in GDAL. Easily convert between vector formats!
GIS Convert for easily converting between spatial and spatial-like formats.
GeoGig if you want to apply the principles of Git to spatial data.
epsg.io if you’ve found some data but it’s not in a standard projection (e.g. latitude and longitude, web mercator) then find the “EPSG” code and stick it in here to find out more about it.
GitSpatial if you just want to wrap a spatial API around your GitHub-hosted GeoJSON data.
TopoJSON an extension for GeoJSON that encodes topology tl;dr it’ll make your GeoJSON up to 80% smaller.
And an honourary mention to Shape2Earth for allowing the easy creation of maps for Google Earth.
Intro to graph databases
Graph databases were conceived of as a means to make the task of exploring the connections and networks between entities much easier. Whereas in more traditional databases we would have used a from of link table to represent the relationships between entities, that relationship is implicit in a graph database with every entity containing direct pointers to its adjacent entities without the need to expensively compute indexes.
One of the more obvious uses for graph databases are to store and analyse the relationships between people – think Facebook, Twitter, or any web property with a concept of followers or memberships. If you have a problem where you need to quickly and efficiently know how X is connected to Y and via whom, than graph databases are worth a look.
Tangent Alert!! While we’re talking about networks and relationships we should introduce the concept of “linked data”. If you haven’t run across the term before (or have, but still don’t understand what it means) check out Linked Data – for the enlightened non-geek reader (or dummies) (or managers) and A dummy’s introduction to linked data (me being the dummy).
The GovCamp wiki has a long list of tools surrounding linked data that may be of use – Svgvizler (for SPARQL graphing), RelFinder (for RDF visual exploration), and SPARQL Editor (for interactive SPARQL query building) are useful too.
Graph DB Software
Neo4j is the popular kid on the graph database block and has a wealth of supporting tools and documentation; and a great community.
Getting your data into Neo4j can be as straightforward as throwing a spreadsheet containing your data, along with instructions on how to construct the relationships between the entities in your data, at Neo4j. For details and in-depth instructions see Importing data into Neo4j – the spreadsheet way and Gmail Email analysis with Neo4j – and spreadsheets. Alternatively, you can use the new in-built ETL functionality from Neo4j 2.1 to load your CSV formatted data in directly – check out the official guide on importing from PostgreSQL to Neo4j.
But sometimes you just can’t escape writing some code to get the job done, and to that end the official Neo4j website has curated a list of libaries for many of the major languages. Neo4j also has a REST batch import API if you want to get right down to the coalface.
Many of the graph databases you’ll come into contact with can be queried via a common syntax called Gremlin – where Gremlin is to graph databases as SQL is to traditional RDBMS dabatases. Applications can then be written on top of Gremlin, as you would SQL, and become database largely agnostic. Gremlin also supports a simple data browser application to test execution of queries.
NetworkX (from the Los Alamos National Laboratory) is a social network analysis library for Python. With a large range of advanced analysis functions built-in (e.g. finding communities within a graph), and good support for importing data into graph databases. For more see Introduction to Social Network Analysis with NetworkX.
Of course there are R packages for graph databases!
Social Network Analysis in R,
Making prettier network graphs with sna and igraph, and
RNeo4j should get you pointing in the right direction.
Graph databases represent complex networks, so it turns out creating useful visualisation can be a tad hard – for an intro to the subject see Visualising Networks: Beyond the Hairball.
Tree and hierarchy visualisation
What if your network isn’t actually a network and more like a tree or straight hierarchy (i.e. it has no interconnections between entities)?
Congratulations, you can use tree visualisations! It’ll be faster and far more visually effective than any other options.
TreeViz is a good start if you just need to run it locally (it’s a Java app), but D3.js can also visualise trees (see this tutorial for step-by-step instructions). D3.js also supports enclosure diagrams (aka circle packing) that may better represet your tree structure than an actual tree would.
But sometimes you care less about the connections in your network and more about the weight those connections have (e.g. the # emails sent between connections) – well for that you want a flow visualisation like a sankey diagram that will visualise the magnitude of flow between nodes in a network
Other visualisation tools
NodeXL for Excel allows you to visualise networks/graphs quickly from right inside Excel.
Gephi is a great desktop tool for interactive visualisation and exploration platform for networks and hierarchial systems. It comes with many good automatic layout algorithims (even for huge graphs) and can easily handle many types of input file, including spreadshets of Tweets.
Cytoscape is like Gephi, but more on the ‘platform’ end of the spectrum. It was originally designed for use in the biological sciences, but has evolved to become a general tool for complex network analysis and visualisation. Cytoscape supports a rich ecosystem of (Java) plugins (aka “Apps”) that allow you to customise and extend the base functionality.
We think it’s awesome that kids are wanting to learn how to code – so if that’s you we’ve collected together a few online tutorials to guide you through some practical example of coding. There are many levels of difficulties amongst tutorials that you can explore and have fun with!
Coding for Kids
Create Stories, Games, and Animations
Create Infographics/Picture Charts
Data and Project Tips
In which we play at being cartographers
Righto, so you’ve got some data and need to provide a map for your users to view and interact with all of your lovely new data. Good news, you’re spoilt for choice! (Are you detecting a theme here?)
Web mapping loosely falls into two categories:
- Software as a Service platforms that provide simple and powerful GUIs for the creation of maps and support hosting of all sorts of different data formats.
These days a lot of the modern libraries and platforms have been optimisied to work well on mobile devices, and in some cases have separate libraries for developing native apps on iOS and Android.
Oh, and if you’re completely new to web mapping check out mapschool: a free introduction to geo to get yourself up to speed on the concepts behing web mapping. The New York Public Library also has a great, and pretty exhaustive runthrough, of making your first web map: From Paper Maps to the Web: A DIY Digital Maps Primer.
MaaS (Maps as a Service)
They both do a great job of covering the basics of map building with quick and easy tools for uploading data and push-button interfaces that abstract away alot of the more complex spatial side of spatial data. They both also have generous free usage tiers.
CartoDB tends to focus more on the “make really pretty vector maps” side of the equation, with great visualisation tools like Torque (beautiful animations with time series data), powerful and simple push-button visualisation of data, and a wonderful SQL API for interacting directly with their PostGIS backend from the client. Oh – and they also have some support for 3D and can handle huge datasets, like colouring every river in the US.
And lastly, CartoDB comes with a powerful point-and-click map building GUI or, if you need more control, the CartoDB.js library exposes all of the same functionality. Oh, and did we mention that it’s open source and you can run your own CartoDB instance?
MapBox focuses slightly more on the traditionally geospatial side of things, with a powerful desktop map designer, MapBox Studio (which can process raster as well as vector data), some great work on tools for processing satellite imagery, developing the vector tiling standard, and pushing the boundaries of web mapping with MapBox GL. On top of all of that they also maintain iOS and Android SDKs, the Mapbox.js library; and APIs for surface heights, geocoding, and directions.
And a couple of honorary mentions:
GeoServer is an open source spatial data server that may be worth you’re look if you’re having to deal with larger or more complex datasets that CartoDB/MapBox can’t handle – or can’t handle without charging you for the pleasure. GeoServer also has support for more advanced functionality like WCS and WPS for extracting raw data from rasters on-the-fly, or writing proceses to perform geospatial analysis on-the-fly. You can run GeoServer on its own, or in combination with other packages via the OpenGeo Suite or GeoNode.
Esri provide a free ArcGIS Developer subscription. This provides you hosting of geospatial content, access to a range of content from Australian government providers, geocode service using G-NAF data, and a range of analysis services.
NationalMap (aka TerriaJS)
And finally, a most honourary mention to the NationalMap project out of the Department of Prime Minister and Cabinet and developed by CSIRO Data61. NationalMap runs on a powerful little open source stack comprised of Cesium and Leaflet. While NationalMap, being primarily a client-side viewing framework, isn’t a true “Maps as a Service” platform it does have a nifty hidden feature – you can control what it displays by passing it URL parameters or by embedding it in an iframe and sending it cross-window messages. Especially check out the handy features for creating chloropleth maps from a simple CSV file.
Ta-da! Instant map ?
You can also use TerriaJS as a library to build your own custom mapping application. See the
TerriaJS documentation to learn how to get started.
OpenLayers is probably the most mature player on the stage, and has recently undergone a ground-up rewrite of the library to simplify the API and leverage modern web technologies like WebGL, Canvas, and the full capabilities of HTML5 and CSS3. It even has support for true 3D web mapping via its OL3-Cesium plugin which seamlessly integrates the Cesium WebGL 3D globe library.
Leaflet, the relatively new kid on the block, started as a protest against other web mapping libraries that required a fair amount of knowledge of geopsatial data to use effectively. As such, it is a super simple API and a more limited range of features than the likes of OpenLayers (but has a large library of community-developed plugins that can help address that gap).
ModestMaps is an even simpler library again than Leaflet, with a simple API and a focus on the core functionality of producing interactive maps easily.
Turf has been developed by the MapBox team and bills itself as “GIS for web maps” with support for common geospatial operations like buffering, contours, hexbinning, et cetera performed all in the client. Turf also integrates easily with Leaflet and MapBox.js.
SVG-powered mapping data visualisations
If you’re looking at maps as more of a data visualisation tool then the subcategory of web mapping libraries that play in the SVG space are probably more appropriate for your needs.
Well we couldn’t not mention D3.js in talking about data visualisation. Michael Bowman’s Designing Beautiful Maps with D3.js talk is worth a look to familiarise yourself with the topic, and then head on over to this truly exhaustive list of examples of using D3 for maps on bl.ocksplorer.org.
The team behind the graphing library Highcharts have a separate Highmaps library that makes creating mapping data visualisations a breeze.
Lastly Polymaps is a bit of a hybrid library in that it provides image and vector-tiled maps via SVG, so you can mix up your choice of basemap (OpenStreetMap, Bing, et cetera) with your image and vector data easily.
Web mapping frameworks
The last sub-category of web mapping library worth a mention are the rich web mapping frameworks that exist to provide a whole UI framework around the map itself (i.e. map toolbars, layer trees and controls, advanced data query UIs, et cetera).
Device agnostic mobile web development
If your project involves development for mobile devices you’ve got a choice to make: take a web-based approach or pick a platform and develop a native app. The former will likely be the quicker approach (unless you’re a gun Android/iOS developer already) and give you a chance of getting something workable hacked together in time.
Backend frameworks for native apps
Going down the path of native application development can give you a really slick looking project, but it does give you a lot more to consider than a simple web app might. Helios and Parse are two backend framworks that’ll take care of analytics, notifications, social sign on, and more.
We’ve already spoken about how you can submit your project source materials, but you’ll more than likely also need a place to host your application on the web.
If you’re building any sort of web-connected application (be it web, mobile, or a desktop application) you’ll need a server to host it on. These days virtual servers are a dime a dozen (and often significantly cheaper than that) – be they a blank box with command line access that you setup yourself (aka IAAS), or a PAAS solution that gives you click-button access to databases, caching layers, system utilities, monitoring, and analytics services – all with a nice GUI to keep you from having to delve into command line hell.
A word on containerisation (Docker)
Containerisation and, more specifcally Docker, is all the rage right now. If you haven’t really run across containerisation before then think of it like server virtualisation on steroids – i.e. In a plain text file you specify the OS you want, the software you’d like installed (Apache, Nginx, Python, whatever), how you want to configure that software, where to find your code (locally, straight from GitHub), and the result is a tiny virtualised server.
Said tiny virtualised server can then be spun up in minutes either on your laptop, in the cloud, or on your team mate’s laptops and all of you will have exactly the same build and configuration. Say goodbye to the headaches of different configurations between development and live applications!
Docker is the best project in this space at the moment, and they’re busy building a great ecosystem including a marketplace for containers and tools for orchestrating multiple containers together into more complex applications.
And due to Docker’s crazy levels of popularity, Amazon Web Services has Docker support built-in!
Static website hosting
Probably the easiest and fastest way to achieve that at the moment is via GitHub Pages that hosts a website straight out of your GitHub repository. In addition to that, GitHub Pages provides the option to generate a project site from a collection of pre-built themes and to point your own custom domain at your site.
If you prefer to start from scratch (on GitHub Pages or elsewhere) Bootstrap and Foundation are the two preeminent responsive frontend web frameworks around these days that cut away a lot of the work of making a site look pretty so you can concentrate on content (and your awesome GovHack project).
Beyond GitHub you could also look at hosting your site on Heroku.
Scraping data from PDFs and the web
As always, the School of Data have an excellent series on the ins and outs of extracting data from PDFs and scraping websites – A gentle Introduction into Extracting Data – with many useful recommendations of the best tools to use for the job.
tl;dr? Well there are a few standout tools…
Tabula is getting a lot of notice for making the process of extracting tabular data from PDFs a (relative) breeze. Download, install, point it at some PDFs and it’ll extract any tabula data in them to a nicely machine-readable CSV or XLS file for you. For a more indepth view have a read through Introducing Tabula (Source news).
Apache Tika, the older man in the scraping PDFs market, is great for extracting text and metadata from a pile of document formats (PDF, XLS, PPT, …) – even PDFs containing text in scanned images. OUseful, the Practical Data Journalism blog, has a good walkthroguh of Getting Text Out Of Anything (docs, PDFs, Images) Using Apache Tika.
Worth a mention as well is PDF Tables a web-based tool from the folks behind ScraperWiki that pretty much does what it says on the box – pulls tabular data out of any PDFs you provide.
On the website scraping end of the equation there are a few desktop and web-based tools around – import.io, UiPath (free trial) and 80legs – but sometimes you just need to write code to do it properly.
Morph.io, which arose out of the demise of ScraperWiki, offers a lightweight scraping framework (Python, PHP, Ruby, or Perl) and a whole web platform and community around scrapers (think Heroku for web scraping).
In Python-land there’s Scrapy – a neat framework for extracting data from the web with a strong community and easily extensible codebase. You can think of Scrapy as being the next level up from libraries like BeautifulSoup and lxml (which excel at parsing HTML and XML) in that it incorporates higher level concepts of scraping like spiders, selectors, and items.
Likewise, Scrapekit is awesome and includes a range of advanced features such as caching, multi-threading, and logging.
This Quora post has a good thread with suggestions for scraping frameworks in a variety of languages.
Unstructured data covers much of the data you will come across – from data buried in PDFs and web sites, to mining data from social networks, but it all requires analysis to extract meaning. There are many tools for getting at the data – see the previous section on scraping data for a range of tools – but the Sunlight Foundation’s Text Analysis in Transparency talk is a great introduction to that world of text analysis and natural language processing.
Extracting meaning from text
Once you have your data in a nicer format you may well need to tackle the problem of pulling something meaningful out of it. Fortunately, there are a lot of good analysis and natural language processing libraries around these days that will allow you to automatically find the meaningful keywords in a body of text.
Natural language processing may be a bit of a heavy topic to dive into during a hackathon, but if you’re feeling brave there are a few good tutorials on the subject to get you started (if you’d like some more academic articles check this StackOverflow question).
As always, there are web-based tools – such as TextRazor and Yahoo Content Analysis – that may be able to save you the trouble of diving into code and learning too much about the theory and practice of NLP whilst time is tight.
There are a surprising number of good NLP libraries around for all of the major languages though:
Beyond the world of NLP you might consider going straight to a search engine that provides similar text interrogation capabilities along with a database to store your data and APIs to query it. Solr and Elastic (formerly ElasticSearch) are pretty well know in this space – but Sphinx and Constellio are worthy entries.
Visualising unstructured text
Being able to visualise unstructured information is key to making sense of it – be it a word tree of a text blob, a whole web page, or a social media feed – tools like Word Tree, Overview, and even Google Charts will help you turn out some quick visualisations. On the academic end of the spectrum the National Science Foundation have made their Jigsaw toolbox available.
Check out See Text in Whole New Way: Text Visualization Tools from Princeton University for a range of other tools.
Working with tabular data
At its simplest, tabular data is data that is stored in rows and columns (hence the name “tabular” i.e. in tables), either in a flat file or a database, and is usually comprised of simple alphanumeric values. CSV/TSV, JSON, XLS(X), and XML are some of the more common formats you’ll find tabular data in, though unfortunately it does still often appear in non machine-readable formats like PDF and DOC and most first be extracted and cleaned before being used.
Converting between data formats
There’s a good chance that you’re going to want to convert your data from the format you’ve found into something a little more modern and useful (like JSON). Mr. Data Converter is a simple web-based tool for coverting from Excel, CSV, and TSV to JSON, HTML, MySQL, PHP, Python, Ruby, and more.
If you need even more control consider the Python libraries pandas (which provides a whole data analysis and modelling framework as well), tablib, or any of the Science and Data Analysis libraries listed on Awesome Python.
Cleaning your data
If your data has had humans involved in entering it then it’s probably fully of all sorts of small variations in how the data have been entered that you’ll need to clean up before it becomes machine-readable. Fortunately, there are a couple of great tools.
OpenRefine (formerly Google Refine) is a powerful desktop tool for cleaning messy data, transforming it between different tabular data formats, and even integrates with web services via some simple connectors so you can, for example, geocode a bunch if addresses using Google direclty in OpenRefine. Check out the School of Data’s simple tutorial on using OpenRefine to see it in action.
Depending on how badly munged your data is a simple old spreadsheet application may get you most of the way to having clean data – as per the excellent A Gentle Introduction to Data Cleaning series from the School of Data. Their Cleaning Data with Spreadsheets walkthrough may also fit the bill.
If out-of-the-box tools aren’t cutting it and you need to dive into code take a look back at some of the Python libraries, like pandas, that we recommended in Converting between data formats. If you’re feeling brave take a look at dedupe, which leverages machine-learning to perform de-duplication and cleansing of data.
And if all else fails you can always fall back to reliable command-line tools like grep, awk, and sed combined with regular expressions. If you need to upskill your regex foo Debuggex and Regexpr should set you on the right path.
Analysing tabular data
So you’ve got a nice clean dataset and now you want to do some analysis on it to understand if reality matches your hypothesis!
Sometimes the simplest tools are the best and a spreadsheet is all you need – Excel, afte rall, is the world’s most widely used IDE!
The Sunlight Foundation has a set of good videos as an intro to Data Visualisation in Google Docs which also covers analysis. And finally, check out this rundown of Excel plugins for analysing and visualising data.
When datasets get larger, or the analysis requirements get more complex, you’ll probably find yourself reaching for a database to do the heavy lifting.
The School of Data has a neat little tutorial on Using SQL for Lightweight Data Analysis that’ll get you started. If you’re playing in PostgreSQL you may find its window functions of great use to perform calculations across sets within your data.
For a deep dive on data analysis in PostgreSQL, R, and Python check out this blog post from Zev Ross.
R provides a platform for advanced data analysis to let you discover and visualise trends even in large datasets. If you’re new to R you should start with The Guerilla Guide to R, basic statistics and graphs in R, and the official Introduction to R. To ease the learning curve check out some of the IDEs for R – RStudio, Rattle, and Deducer.
The true value of R lies in its huge array of libraries and addons, such as bigvis (visualise up to 10 million data points in mere seconds) and the big list of 10 R packages I wish I knew about earlier.
To get started with charting in R check out the handy Getting Started with Charts in R guide, Simple charts in R tutorial, or some fun putting pictures of Pokemon where their power level is on an X/Y axis.
When it comes to sharing your analysis with the world check out Knitr, for quick and easy report generation, googleVis for making R and Google Charts talk nicely, and Shiny for a full-blown web app framework for R to turn your awesome analyses into a shiny interactive web app (such as this demo).
Visualising tabular data
We’ve already touched on visualisation in previous section on Resources for building data visualisations.
This is front side content.
This is back side content.